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Abstract 

The  research  objective  is  to  develop  algorithms  that  can  be  embedded  in  a  distributed 
coordination  and  control  architecture  for  teams  of  multiple  UAVs.  The  main  prob¬ 
lem  is  to  determine  the  role  of  errors  in  the  situational  awareness  (SA)  in  the  fleet 
coordination  problem  by  investigating  two  fundamental  questions:  how  does  specific 
sensor  information  impact  the  estimation  error,  and  how  does  this  estimation  error 
impact  the  control  performance?  The  answers  to  these  questions  will  provide  a  clear 
indication  of  the  impact  of  errors  in  the  SA  on  the  control  performance  and  should 
enable  the  design  of  an  efficient  (and  dynamic)  communication  architecture.  Our  re¬ 
search  has  resulted  in  several  algorithms  that  use  mixed-integer  linear  programming 
(MILP)  to  perform  both  the  activity  and  path  planning  components  of  the  team  co¬ 
ordination.  The  focus  of  recent  work  has  been  on  improving  the  robustness  of  the 
various  planning  algorithms  to  uncertainty  in  the  information  available.  We  have 
also  added  four  new  UAVs  to  the  multi-UAV  testbed  that  is  being  used  to  evaluate 
various  distributed  control  architectures. 

Research  Status 

The  following  progress  has  been  made  in  the  past  year: 

1 .  Developed  new  distributed  task  assignment  algorithm  that  removes  inconsisten¬ 
cies  due  to  imperfect  information  across  the  UAV  team  by  communicating  the 
plans  to  resolve  conflicts  [1].  Simulation  results  show  that  the  new  approach  is 
more  efficient  than  simply  trying  to  fully  synchronize  the  situational  awareness. 

2.  Developed  new  theoretical  approach  to  optimal  search  that  captures  the  uncer¬ 
tainty  typically  present  in  a  priori  descriptions  of  the  targets  and  threats  in  the 
environment  [2], 

3.  Extended  our  Robust  Model  Predictive  Controller  (RMPC)  [3]  algorithm  to 
develop  a  “Robust  Safe  but  Knowledgeable”  (RSBK)  approach  [4].  Compared 
our  RMPC  algorithm  with  a  LMI-based  min-max  approach,  and  developed  a 
hybrid  algorithm  that  retains  the  advantages  of  both  [5]. 

4.  Augmented  the  previous  UAV  fleet  with  4  Monocoupe  90As  [9]. 

5.  Performed  an  experimental  demonstration  of  our  decentralized  MPC  algorithm 
using  three  rovers  [6,7].  Extended  the  RMPC  algorithm  to  a  variable  horizon 
formulation  that  is  less  restrictive  for  vehicle  maneuvering  problems  [8]. 

The  following  sections  provide  more  detail  on  the  main  topics. 


Fig.  1:  Left:  Robust  Distributed  Task  Assignment  algorithm  showing  the  two  stages 
of  communication  used.  Right:  Comparison  of  effectiveness  of  communication  in  the 
2  phases  of  the  RDTA  algorithm  (consensus  and  planning)  on  the  performance. 


Distributed  Task  Assignment:  Recent  work  has  focused  on  developing  a  new 
robust  distributed  task  assignment  (RDTA)  algorithm  for  a  team  of  UAVs.  Central¬ 
ized  planning  can  be  impractical  due  to  constraints  on  the  computation  (algorithm 
scalability)  and  communication  (robustness).  However,  task  planning  is  hard  to  de¬ 
centralize  because  of  the  strong  coupling  between  agents  in  the  choice  of  the  tasks. 
Distributed  task  allocation  methods  have  been  developed  that  synchronize  informa¬ 
tion  across  the  fleet,  have  each  agent  plan  by  running  a  centralized  method,  and 
then  implement  their  own  resulting  plan,  but  this  “implicit  coordination”  approach 
can  easily  fail  by  giving  conflicting  plans  if  the  synchronization  is  poor,  so  it  places 
stringent  requirements  on  the  communication  resources  [1],  Furthermore,  reaching 
consensus  is  not  always  possible  for  large  teams  in  a  complex,  rapidly  changing  en¬ 
vironment.  We  have  extended  the  basic  implicit  coordination  approach  to  achieve 
better  (i.e.,  more  consistent)  performance  with  imperfect  data  synchronization.  The 
resulting  Robust  Distributed  Task  Assignment  method  shown  in  Fig.  1  assumes  some 
degree  of  data  synchronization,  but  adds  a  second  planning  step  based  on  shared 
planning  data.  This  exchange  of  candidate  plans  guarantees  that  all  UAVs  have  the 
same  information  during  the  final  planning  step,  which  ensures  that  the  independently 
designed  plans  are  consistent.  The  approach  is  analogous  to  closing  a  synchronization 
loop  on  the  planning  process  to  reduce  the  sensitivity  to  exogenous  disturbances. 

The  performance  of  the  algorithm  was  investigated  for  a  scenario  with  5  UAVs  and 
10  targets  to  show  the  advantages  of  this  method  in  reducing  the  conflicts  in  the 
assignments,  generating  feasible  plans  and  increasing  the  performance  of  the  plan 
compared  to  implicit  coordination.  Fig.  1  compares  the  effect  of  communication 
(measured  in  words )  in  the  different  phases  of  the  planning  (reaching  consensus  in  SA, 
communicating  plans)  on  the  performance  of  the  assignment.  Note  that  in  this  plot, 
implicit  coordination  is  effectively  equivalent  to  communicating  2  words,  or  one  petal, 
in  the  planning  phase.  The  graph  shows  that  increasing  the  communication  during 
either  the  consensus  or  planning  phases  improves  the  team  performance.  To  maximize 
the  performance,  it  is  clear  that  some  communication  in  both  phases  is  needed,  but  the 


results  also  clearly  show  that  communication  in  the  planning  phase  is  more  efficient 
than  in  the  information  phase  in  the  sense  that  8  words  of  communication  in  the 
planning  phase  have  approximately  the  same  effect  on  performance  as  80  words  in  the 
consensus  phase.  This  improved  efficiency  indicates  that  RDTA  provides  a  tractable 
distributed  planning  approach  for  complex  task  allocation  problems. 


Robust  Search:  We  have  extended  traditional  search  algorithms  to  account  for 
uncertainty  in  the  initial  information.  Most,  if  not  all,  previous  research  assumes 
that  the  initial  target  information  (such  as  the  probability  of  target  existence  in  each 
grid  cell)  is  known  precisely.  This  assumption  is  very  restrictive,  particularly  in  highly 
uncertain  environments,  because  poor/conflicting  intelligence,  communication  delays, 
or  noisy  sensors  will  cause  uncertainty  in  the  information.  Our  algorithm  relaxes  the 
assumption  on  the  precision  of  the  initial  knowledge  by  generalizing  the  prior  in¬ 
formation  to  be  a  distribution  on  the  initial  distribution  [2].  The  Beta  distribution 
( f(x )  =  x{,_1(l  —  x)c _1,  x  €  [0, 1])  is  used  to  describe  the  uncertainty  in  the 

initial  probability  (i.e. ,  the  support  is  the  prior  probability  of  target  existence).  A 
key  benefit  of  using  Beta  distributions  is  the  analytical  tractability  since,  by  assuming 
a  very  general  Bernoulli  sensor  model,  the  Beta  distribution  is  a  conjugate  distribu¬ 
tion,  and  therefore  the  Beta  distribution  property  is  preserved  through  a  Bayesian 
measurement  update. 


This  new  search  formulation  was  used  121  to 

L  J 

develop  analytic  predictions  of  the  mean  time 
to  look  at  a  target  in  order  to  raise  the  confi¬ 
dence  of  a  target  existence  beyond  a  predefined 
threshold  a.  This  metric  (also  called,  looks 
or  glints)  is  useful  for  planning  because,  given 
the  prior  uncertainty  description  of  the  prior 
probability,  an  estimate  for  the  UAV  “time  on 
target”  can  be  obtained.  For  example,  consider  a  scenario  with  4  targets  in  the  envi¬ 
ronment  and  an  assumed  initial  probability  of  existence  of  50%;  a  target  is  considered 
detected  if  the  probability  exceeded  a  confidence  of  a  =  0.85;  and  the  sensor  error 
was  assumed  to  be  £  =  0.95  (probability  of  correct  detection).  Using  the  mean  time 
on  target  for  a  nominal  planning  scheme  (column  2  in  the  table),  each  target  exceeds 
the  prescribed  threshold  for  a  280-step  mission.  Our  new  robust  algorithm  developed 
a  196-step  mission  that  exceeded  the  confidence  for  each  target,  a  reduction  of  30% 
from  the  nominal. 


Mean  time  on  target:  a  =  0.85,  £  =  0.95 


Target 

&Nom 

&Ne  w 

1 

0.89 

0.85 

2 

0.90 

0.85 

3 

0.87 

0.85 

4 

0.85 

0.85 

Total  Time 

280 

196 

Trajectory  Optimization:  Model  Predictive  Control  (MPC)  is  a  powerful  tool 
for  the  UAV  guidance  problem  because  optimization-based  controllers  can  operate 
close  to  constraint  boundaries  to  obtain  better  performance  than  traditional  control 
approaches.  As  a  result,  however,  small  disturbances  could  drive  the  system  infeasible, 
so  we  must  account  for  external  disturbances  and  modeling  errors.  We  have  developed 
a  new  robust  MPC  (R.MPC)  approach  that  uses  constraint  tightening  (CT)  [15]  with 
a  more  general  candidate  policy,  thereby  leading  to  a  less  constrained  optimization 
and  hence  a  less  conservative  controller  [3,7,8,10].  The  approach  retains  “margin” 


Fig.  2  :  RSBK  compared  with  previous  algorithms,  which  fail  to  navigate  the  UAV  to 
the  goal  due  to  disturbances  or  local  minimum. 


for  future  feedback  action,  which  becomes  available  to  the  MPC  optimization  as  time 
progresses.  Since  robustness  follows  only  from  the  constraint  modifications,  only 
nominal  predictions  are  required,  avoiding  both  the  large  growth  in  problem  size 
associated  with  incorporating  multivariable  uncertainty  in  the  prediction  model  and 
the  conservatism  associated  with  worst  case  cost  predictions,  a  common  alternative. 

Robust  Safe  but  Knowledgeable  (RSBK)  Algorithm:  We  extended  the  RMPC  algo¬ 
rithm  to  design  trajectories  for  a  UAV  performing  long-range  maneuvers  [4].  RSBK 
uses  a  robust  control  invariant  admissible  set  as  the  terminal  set  of  each  plan  that 
does  not  need  to  be  a  target  set  of  the  overall  guidance  problem.  The  approach  is 
similar  to  the  previous  RH-MILP  algorithm,  where  the  controller  designs  a  detailed 
trajectory  over  a  short  planning  horizon  using  a  shortest-path  approximation  of  the 
cost-to-go.  For  RSBK  this  cost-to-go  estimate  is  based  on  the  tightened  constraints, 
so  that  the  future  maneuver  is  feasible,  even  with  the  disturbance  inputs.  Fig.  2 
shows  numerical  simulations  that  highlight  the  advantages  of  the  RSBK  algorithm: 
the  vehicle  is  guaranteed  to  remain  safe  under  the  influence  of  disturbances;  much 
longer  robust  trajectories  can  be  constructed  on-line.  Combined  with  our  Decentral¬ 
ized  MPC  (DMPC)  work  [7],  our  new  algorithm  can  be  used  to  generate  trajectories 
for  multiple  vehicles  maneuvering  in  a  complex  and  uncertain  environment. 

Hybrid  Robust  MPC  Algorithm:  Another  chal-  ^  30 
lenge  associated  with  RMPC  is  to  achieve  bet- 

2.5 

ter  performance  under  worse-case  disturbance 
inputs.  We  unified  our  constraint  tighten¬ 
ing  RMPC  algorithm  with  a  min-max  RMPC  15 
algorithm  that  uses  Linear  Matrix  Inequal-  10 
ities  (LMIs)  and  robust  optimization  tech-  05 
niques  [16]  to  achieve  robustness  and  perfor-  o.o 
mance  [5].  Although  apparently  different  in 
structure,  we  show  that  these  two  MPC  algo-  erf  comParison  of  3 

rithms  are  closely  related,  and  actually  have  RMPC  al9onthms  under  different 
equivalent  decision  variables  when  performing  disturbance  inputs 
a  closed-loop  prediction.  Based  on  this  analysis  we  developed  a  hybrid  algorithm  that 
retains  the  advantages  of  both:  a)  large  feasible  region  indicating  the  algorithm  can 
tolerate  higher  disturbance  levels;  b)  performance  guarantees  under  worst-case  dis- 


Fig.  4:  Flight  data  showing  show  the  relative  separation  error  between  2  UAVs  during 
an  autonomous  in-flight  rendezvous  with  a  photo  from  onboard. 


turbance  inputs.  As  shown  in  Fig.  3,  the  three  algorithms  have  similar  performance 
with  random  (but  bounded)  disturbance  inputs.  When  the  worst  case  disturbances 
are  applied,  however,  the  hybrid  algorithm  outperforms  the  CT  algorithm  because  it 
uses  a  min-max  cost.  With  a  more  constrained  system  (System  2),  the  hybrid  algo¬ 
rithm  achieves  better  performance  than  the  LMI-based  algorithm  because  its  larger 
feasible  set  gives  a  larger  operating  region. 

Multi-UAV  Testbed:  The  UAV  testbed  has  been  operated  autonomously  on  nu¬ 
merous  occasions  [9].  For  example,  Fig.  4  shows  the  relative  position  error  for  two  of 
the  UAVs  performing  an  autonomous  rendezvous.  The  position  error  is  shown  to  con¬ 
verge  to  within  25  m.  Wind  disturbances  during  this  flight  were  approximately  1  m/s 
(~5%  of  flight  speed).  The  relative  position  errors  show  the  vehicles  maintaining 
coordinated  flight  despite  the  moderate  disturbance  levels  acting  on  the  system. 

A  second  type  of  UAV  has  been  added  to 
the  testbed  using  a  Monocoupe  airframe 
(see  Fig.  5),  which  have  several  advantages 
over  the  Trainer  60  (double  wing  area  al¬ 
lows  for  a  payload  capacity  of  up  to  5  lbs 
and  larger  fuel  capacity  increases  the  flight 
time  to  2  hours).  These  vehicles  can  also 
be  purchased  as  almost  ready  to  fly,  so 
they  retain  the  operational  simplicity  orig¬ 
inally  envisaged  for  the  UAV  testbed.  The  current  testbed  has  four  Monocoupes  that 
fly  with  the  Trainer  60's  in  a  heterogeneous  team. 
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Fig.  5:  Comparison  of  Monocoupe  and 
Trainer  60  in  the  MIT  UAV  testbed. 
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Abstract 

The  research  objective  is  to  develop  algorithms  that  can  be  embedded  in  a  distributed 
coordination  and  control  architecture  for  teams  of  multiple  UAVs.  The  main  problem 
is  to  determine  the  role  of  errors  in  the  situational  awareness  (SA)  in  the  fleet  coordi¬ 
nation  problem  by  investigating  two  fundamental  questions:  how  does  specific  sensor 
information  impact  the  estimation  error,  and  how  does  this  estimation  error  impact 
the  control  performance?  The  answers  to  these  questions  will  provide  a  clear  indica¬ 
tion  of  the  impact  of  errors  in  the  SA  on  the  control  performance  and  should  enable 
the  design  of  an  efficient  (and  dynamic)  communication  architecture.  Our  research 
has  resulted  in  several  algorithms  that  use  rnixed-integer  linear  programming  (MILP) 
to  perform  both  the  activity  and  path  planning  components  of  the  team  coordina¬ 
tion,  and  recent  work  has  focused  on  improving  the  robustness  of  these  algorithms  to 
uncertainty  in  the  SA.  We  have  also  implemented  these  algorithms  on  our  external 
multi-UAV  testbed  and  a  new  indoor  multi-UAV  testbed. 

Research  Status 

The  following  progress  has  been  made  in  the  past  year: 

1.  Demonstrated  the  benefits  of  our  distributed  task  assignment  algorithm  for 
sparse  communication  networks.  Extended  previous  consensus  algorithms  to 
remove  the  bias  in  the  estimates  for  general  network  architectures  [12,16,19,5]. 

2.  Developed  new  theoretical  approach  to  optimal  search  that  captures  the  uncer¬ 
tainty  typically  present  in  a  priori  descriptions  of  the  targets  and  threats  in  the 
environment  [13,15,21,24]. 

3.  Developed  new  computationally  efficient  decentralized  MPC  (DMPC)  algo¬ 
rithms  for  multi-vehicle  systems  coupled  through  their  constraints,  which  is  the 
case  for  collision  avoidance  and  formation  flying  scenarios.  These  techniques 
maintain  robust  feasibility  of  the  entire  team  using  local  communication  and 
computation,  and  thus  scale  well  with  the  size  of  the  UAV  team  [1,17,4,18,3,7]. 

4.  Extended  the  DMPC  algorithms  to  enable  cooperation  between  vehicles  with¬ 
out  substantially  increasing  the  sub-problem  size  [14],  Results  indicate  that 
this  extension  has  the  potential  of  significant  performance  improvements  over 
previous  techniques  [22], 

5.  Performed  an  experimental  demonstration  of  our  distributed  MPC  algorithm 
through  its  distributed  implementation  on  three  rovers  [7]. 

The  following  sections  provide  more  detail  on  the  main  topics. 


Distributed  Planning:  Previous  reports  presented  a  new  robust  distributed  task 
assignment  (RDTA)  algorithm  for  a  team  of  UAVs.  We  extended  the  analysis  of  that 
algorithm  to  consider  the  performance  with  different  communication  architectures  and 
errors.  The  results  show  that  the  second  communication  step  introduced  during  the 
planning  phase  of  RDTA  is  crucial  for  very  sparse  networks  because  the  convergence 
rate  of  the  information  consensus  algorithms  tends  to  be  quite  slow  [12,19]. 

Developing  a  consensus  for  a  team  of  agents  with  inconsistent  information  is  a  core 
component  of  many  decentralized  planning  schemes,  including  RDTA.  Numerous  re¬ 
searchers  have  investigated  this  problem,  and  recent  results  proposed  a  Kalman  con¬ 
sensus  algorithm  (KCA)  and  gave  a  detailed  analysis  of  its  convergence.  However, 
we  demonstrated  that  this  KCA  can  result  in  biased  estimates  that  deviate  from  the 
centralized  solution,  if  it  had  been  computed.  The  bias  is  shown  to  be  related  to  a 
“double  counting”  of  the  information  from  agents  that  have  a  higher  outflow  (con¬ 
nectivity  to  the  other  agents)  in  the  network.  We  modified  the  algorithm  to  correctly 
handle  the  differences  in  outflows  in  general  networks,  and  the  proof  of  convergence 
for  this  Modified  Distributed  Kalman  Consensus  (MDKC)  algorithm  to  the  unbiased 
estimate  is  provided  for  both  static  and  dynamic  communication  networks  [16,5]. 

Robust  Planning:  We  developed  a  new  robust  approach  to  the  task  assignment 
of  UAVs  operating  in  uncertain  dynamic  environments  for  which  the  optimization 
data,  such  as  target  cost  and  target-UAV  distances,  are  time-varying  and  uncer¬ 
tain  [6,23].  The  impact  of  this  uncertainty  in  the  data  is  mitigated  by  combining 
robust  and  adaptive  planning,  resulting  in  the  Robust  Filter- Embedded  Task  Assign¬ 
ment  (RFETA)  algorithm  that  uses  both  proactive  techniques  that  hedge  against 
the  uncertainty  and  reactive  approaches  that  limit  churning  behavior  by  the  vehicles. 
Typical  simulation  results  for  a  small  problem  are  shown  in  Figure  1,  which  plots  the 
actual  accumulated  score  during  a  mission.  Compared  to  the  nominal  algorithm,  the 
robust  strategy  achieves  a  higher  final  accumulated  value,  which  is  consistent  with 
selecting  better  task  assignments  given  the  uncertain  data.  The  original  FETA  algo¬ 
rithm  gives  a  faster  convergence  to  the  final  value,  which  is  consistent  with  avoiding 
churning,  but  the  RFETA  approach  gives  the  same  fast  convergence,  a  higher  final 
score,  and  the  least  standard  deviation,  and  thus  is  the  best  overall  strategy. 

Robust  Search:  The  goal  of  this  research  is  robust  assignment  of  search  tasks  in  the 
presence  of  dynamic  and  uncertain  target  motion  [13,15,21,24],  The  approach  taken 
is  probabilistic,  based  on  discrete-state,  discrete-time  Markov  chain-like  models.  The 
target  motion  across  the  discretized  environment  is  described  by  a  probabilistic  state 
transition  matrix,  where  the  transition  to  adjacent  cells  is  fully  represented  by  prob¬ 
ability  distributions.  While  the  probabilities  used  to  encode  information  about  the 
environment  are  typically  assumed  to  be  exactly  known  in  the  search  theory  litera¬ 
ture,  they  are  often  the  result  of  prior  information  that  is  both  erroneous  and  delayed, 
and  will  likely  be  poorly  known  to  mission  designers.  Our  work  has  developed  a  new 
framework  that  accounts  for  this  uncertainty  in  the  probability  maps  for  stationary 
targets,  and  we  recently  extended  the  approach  to  consider  dynamic  environments. 

The  dynamic  case  considers  uncertainty  in  the  prior  information,  and  creates  Un- 


Time 

Fig.  1:  Accumulated  score  and  associ-  Fig.  2  :  Number  of  targets  found  (top) 
ated  confidence  level  for  four  task  plan-  and  thought  to  be  found  (bottom).  BPE 
ning  algorithms.  in  blue,  UPM  in  red. 

certain  Probability  Maps  (UPMs)  that  take  into  account  both  poor  knowledge  of  the 
probabilities  and  the  propagation  of  their  uncertainty  through  the  environment.  Many 
methods  could  be  chosen  to  describe  the  uncertainty,  such  as  a  truncated  Gaussian 
distribution,  but  these  distributions  must  be  chosen  wisely  because  the  measurement 
update  following  the  prediction  can  be  computationally  intensive  if  the  distributions 
are  not  conjugate  and  sampling  methods  are  needed  to  generate  the  posterior  dis¬ 
tribution.  Our  choice  of  the  Beta  distribution  [13]  to  describe  the  uncertainty  is 
conjugate  with  the  Bernoulli  sensor  model,  and  thus  computationally  efficient.  This 
new  approach  still  requires  an  approximation  in  the  prediction  step  that  is  similar  to 
other  techniques,  but  this  approximation  does  not  add  any  significant  computational 
load  because  it  can  be  written  as  a  closed-form  update  to  the  mean  and  variance  of 
the  Beta  distributions  associated  with  each  cell.  As  a  result,  the  prediction  and  mea¬ 
surement  update  stages  can  be  expressed  as  a  recursion  on  the  means  and  variances  of 
the  distributions,  which  are  very  simple  to  implement  and  conceptually  very  similar 
to  conventional  filtering  methods. 

A  key  result  of  this  work  is  a  new  algorithm  for  implementing  UPMs  in  real-time,  and 
it  was  demonstrated  in  various  simulations  that  this  algorithm  leads  to  more  cautious 
information  updates  than  previous  approaches,  and  thus  is  less  susceptible  to  false 
alarms.  For  example,  Figure  2  gives  results  for  a  simple  scenario  showing  that  our 
UPM  formulation  creates  a  more  realistic  search  strategy  than  the  traditional  Bayes 
update  scheme  (BPE),  which  is  overly  optimistic  [15].  These  results  also  provide  nu¬ 
merous  insights  on  the  effect  of  the  design  parameters  on  the  responsiveness  of  the  new 
algorithm.  This  work  has  been  extended  to  embed  the  uncertainty  in  the  probabilis¬ 
tic  target  motion  using  Dirichlet  distributions,  and  shows  significant  improvements 
in  the  search  performance  over  otherwise  mismatched  motion  models  [24]. 

Trajectory  Optimization:  We  extended  the  robust  safe  but  knowledgeable  (RSBK) 
algorithm  [7-9,11,17,18]  to  the  multi-vehicle  case.  RSBK  has  the  advantage  that  it 
enables  the  use  of  much  shorter  planning  horizons  while  still  preserving  the  robust 
feasibility  guarantees  of  our  previous  MPC  approaches.  A  distributed  version  was  also 
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developed  (called  DRSBK),  which  is  more  suitable  for  real-time  execution,  retains  the 
robust  feasibility  guarantees  of  the  centralized  approach,  and  only  requires  that  each 
agent  have  local  knowledge  of  the  environment  and  neighbor  vehicles  plans.  DRSBK 
also  facilitates  the  use  of  a  significantly  more  general  implementation  architecture 
for  the  distributed  trajectory  optimization,  which  further  decreases  the  delay  due  to 
computation  time  [7,17]. 

Extending  that  work  further,  we  developed  a 
new  decentralized  trajectory  optimization  ap¬ 
proach  for  systems  with  independent  dynam¬ 
ics  but  coupled  constraints  [14,22].  The  goal 
was  to  improve  the  performance  of  the  entire 
fleet  by  including  more  cooperation  between 
the  vehicles.  This  is  achieved  by  exploiting  the 
sparse  structure  of  active  couplings  that  is  in¬ 
herent  in  many  of  the  trajectory  optimization 
problems  of  interest.  This  enables  each  local 
optimization  to  use  a  low-order  parameteriza¬ 
tion  of  the  other  agents  states,  thereby  facili¬ 
tating  negotiation  while  keeping  the  problem 
size  small.  The  key  features  of  this  approach 
include  (a)  no  central  negotiator  is  required; 

(b)  it  maintains  feasibility  over  the  iterations, 
so  the  algorithm  can  be  stopped  at  any  time; 
and  (c)  the  local  optimizations  are  shown  to 
always  decrease  the  overall  cost. 
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Fig.  3  :  Performance  and  com¬ 
putation  comparison  with  riy  — 
{5,  7, 10, 15}  (top  to  bottom). 


Figure  3  shows  simulation  results  comparing 
the  global  objective  value  and  cumulative  com¬ 
putation  time  of  three  algorithms:  centralized, 

(non-cooperative)  decentralized,  and  the  new 
cooperative  decentralized.  Different  planning  horizons  N  —  {4, 6, 8}  and  fleet  sizes 
nv  were  considered  to  investigate  the  scalability  of  the  algorithms.  The  solutions  of 
the  decentralized  non-cooperative  approach  have  very  high  cost  and,  although  the 
computation  times  are  small,  are  off  the  plot.  The  lines  with  x  show  the  evolution  of 
the  global  cost  of  the  cooperative  decentralized  algorithm.  The  results  show  that  this 
new  algorithm  has  objective  values  comparable  to  the  centralized  (and  hence  globally 
optimal)  solutions  (o),  but  the  computation  scales  much  better  with  the  fleet  size. 


Multi-vehicle  Testbeds:  The  multi-rover  testbed  was  used  to  test  the  DRSBK 
algorithm  in  real-time.  Numerous  scenarios  (e.g.  Fig.  4)  were  constructed  to  high¬ 
light  key  features  of  the  algorithm:  on-board  laptops  generate  trajectories  on-line, 
which  shows  the  computational  advantages  for  real-time  applications;  the  rovers  are 
moving  initially,  which  highlights  the  ability  to  quickly  initialize  the  algorithms;  the 
vehicles  are  required  to  maneuver  in  a  constrained  environment,  which  demonstrates 
the  robust  feasibility  under  the  action  of  disturbances  and  that  plans  based  on  dis- 


Fig.  4:  DRSBK  demonstration  for  obstacle  and  vehicle  avoidance 

tributed  computation  can  satisfy  the  coupling  collision  avoidance  constraints.  The 
results  also  demonstrated  online  dynamic  grouping  and  re-grouping  of  the  vehicles, 
enabling  parallel  computation. 


Fig.  5:  Coupe  UAVs,  Coupe’s  flying,  Coupe  pan/tilt  camera 


The  UAV  testbed  (Figs.  5  and  6)  is  currently  being  used 
to  perform  research  on  autonomous  search  and  target 
tracking  [2].  The  hardware  is  being  upgraded  to  carry 
additional  onboard  computers  to  increase  the  level  of 
autonomy,  provide  onboard  vision  processing,  and  sup¬ 
port  ad-hoc  communication  networks  between  vehicles. 
The  coordination  algorithms  described  in  the  report  are 
also  being  demonstrated  on  a  unique  indoor  testbed  that 
includes  multiple  flying  and  ground  vehicles  [20]. 


Fig.  6:  Automatic  video 
tracking  antenna. 
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Abstract 

The  research  objective  is  to  develop  algorithms  that  can  be  embedded  in  a  distributed 
coordination  and  control  architecture  for  teams  of  multiple  UAVs.  The  goal  is  to  de¬ 
termine  the  role  of  errors  in  the  situational  awareness  (SA)  in  the  fleet  coordination 
problem  by  investigating  two  fundamental  questions:  how  does  specific  sensor  infor¬ 
mation  impact  the  estimation  error,  and  how  does  this  estimation  error  impact  the 
control  performance?  The  answers  to  these  questions  should  provide  a  clear  indica¬ 
tion  of  the  impact  of  errors  in  the  SA  on  the  control  performance  and  should  enable 
the  design  of  an  efficient  (and  dynamic)  communication  architecture.  Our  research 
has  resulted  in  new  algorithms  to  perform  distributed  task  assignment  that  is  robust 
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to  environmental  disturbances,  and  multi-UAV  search  that  is  robust  to  uncertainty  in 
the  knowledge  of  the  environment.  We  have  implemented  many  of  these  algorithms 
on  our  new  indoor  multi-UAV  testbed  called  RAVEN. 


Research  Status 

The  following  progress  has  been  made  in  the  past  year: 

1.  Further  analysis  of  our  Unbiased  Kalman-based  Consensus  Algorithm  [4,  9,  10, 
11].  Developed  a  new,  completely  decentralized,  auction-based  task  assignment 
algorithm  that  creates  conflict  free  assignments  with  limited  communication 
and  without  relaying  of  information  between  agents.  The  algorithm  is  proven 
to  converge  and  is  shown  to  be  resource  efficient  [12]. 

2.  Extended  prior  work  on  robust  MPC  [1,  3,  6,  7,  16,  17]  using  constraint  tighten¬ 
ing  by  generalizing  the  feedback  correction  policy.  The  new  approach  is  shown 
to  represent  a  strictly  larger  class  of  feedback  policies  when  compared  to  pre¬ 
vious  algorithms  [19].  We  also  developed  a  new  off-line  convex  optimization 
procedure  to  design  a  disturbance  rejection  controller  that  can  tolerate  much 
stronger  disturbances  leading  to  significant  performance  improvements  over  pre¬ 
vious  approaches  at  high  disturbance  levels  [19,  21]. 

3.  Developed  a  new  cooperative  decentralized  optimization  algorithm  [6,  17]  that 
minimizes  the  global  performance  by  solving  a  series  of  small  subproblems  across 
the  team  without  just  reproducing  the  global  optimization  problem  on  each  vehi¬ 
cle.  The  global  objective  is  proven  to  monotonically  decrease  over  each  iteration, 
and  the  feasibility  of  the  entire  team  is  maintained.  The  algorithm  was  extended 


to  develop  a  cooperative  form  of  the  decentralized  robust  safe  but  knowledgeable 
(DRSBK)  algorithm  for  multi- vehicle  trajectory  optimization  [18].  CDRSBK 
achieves  cooperative  behaviors  by  enabling  vehicles  to  sacrifice  the  local  cost  if 
it  leads  to  a  global  improvement  [18,  20,  21]. 

4.  Experimental  demonstration  of  our  distributed  planning  algorithms  [21,  22]  on 
a  new  indoor  UAV  testbed  [24,  26]. 

The  following  sections  provide  more  detail  on  the  main  topics. 

Distributed  Planning:  Developing  consensus  for  a  team  of  agents  with  incon¬ 
sistent  information  is  a  core  component  of  many  decentralized  planning  schemes. 
We  recently  extended  the  analysis  of  the  Unbiased  Kalman  Consensus  Algorithm 
(UKCA)  [9,  12],  with  the  following  key  observations:  (i)  exchanging  both  the  state  xx 
and  uncertainty  weights  Px  in  each  iteration  enables  UKCA  to  converge  to  the  desired 
weighted  average  for  a  very  general  class  of  communication  networks;  (ii)  the  scaling 
factor  introduced  in  UKCA  is  associated  with  the  ou tflow,of  the  sending  agent,  which 
differs  from  other  approaches  in  the  literature;  (iii)  this  scaling  also  has  the  implicit 
effect  of  essentially  setting  the  outflows  of  all  agents  to  1,  but  has  no  effect  on  the 
inflow.  Thus  the  scaling  does  not  just  implicitly  balance  the  network,  and  standard 
convergence  results  do  not  apply.  We  therefore  provide  a  proof  of  the  convergence 
of  UKCA  to  the  unbiased  estimate  for  both  static  and  dynamic  communication  net¬ 
works  [4,  9]. 

Many  distributed  task  assignment  algorithms  (including  RDTA  [10,  12])  require  that 
agents  relay  information  to  neighbors.  Our  new  auction  based  task  assignment 
(ABTA)  algorithm  combines  consensus  and  auction  ideas  to  create  conflict  free  as¬ 
signments  using  limited  communication  without  requiring  relaying  of  information. 
The  basic  idea  of  the  ABTA  is  for  each  agent  to  act  in  a  greedy  way  and  choose  the 
best  task  for  itself.  It  then  communicates  its  value  for  the  tasks  with  its  neighbors 
to  determine  if  it  is  the  best  agent  to  be  assigned  to  that  specific  task.  If  an  agent 
then  determines  that  another  agent  can  achieve  a  better  value  by  being  assigned  to  a 
task,  it  discards  the  task  and  chooses  the  next  best  task.  This  is  similar  to  other  auc¬ 
tion  algorithms,  but  in  ABTA  the  information  is  only  exchanged  between  neighbors. 
ABTA  is  proven  to  converge  to  a  conflict  free  complete  assignment  for  very  general 
communication  networks.  A  comparison  to  implicit  coordination  shows  that  the  per¬ 
formance  is  within  7%  of  the  best  assignment,  but  there  is  a  significant  reduction  in 
the  communication  requirements  [12], 

Robust  Search:  We  developed  computationally  tractable  search  algorithms  that 
account  for  the  uncertainty  in  the  optimization  parameters  [13,  14,  15].  The  work  is 
based  on  Markov  Chain  theory  and  its  applications  to  finite  state,  finite  action,  finite 
horizon  Markov  Decision  Processes  (MDPs).  It  gives  a  computationally  tractable  so¬ 
lution  to  finding  the  optimal  policies  subject  to  uncertainty  in  the  initial  probabilities 
and  the  state  transition  matrix.  The  novelty  in  this  work  is  the  adaptation  of  the 
Sigma  Point  sampling  methods  (SPSM)  to  dynamic  multi-stage  problems  where  the 
probabilistic  parametrization  may  be  poor.  We  show  that,  the  SPSM  can  be  used 
to  generate  computationally  economical  samples  of  the  uncertain  transition  matrices. 


Furthermore,  the  probability  distributions  on  the  uncertain  parameters  can  be  main¬ 
tained  in  a  closed  form  approximation  with  virtually  no  additional  computational 
overhead  by  compactly  describing  them  with  the  Diriehlet  distribution. 

Distributed  Trajectory  Optimization:  We  developed  a  new  form  of  robust 
Model  Predictive  Control  (MPC)  using  constraint  tightening,  where  the  degree  of 
tightening  is  a  convex  function  of  the  feedback  parameters  [19].  The  proposed  ap¬ 
proach  is  shown  to  be  able  to  represent  a  strictly  larger  class  of  feedback  policies 
when  compared  to  previous  algorithms.  Further  analytical  results  provide  a)  neces¬ 
sary  and  sufficient  conditions  on  the  choice  of  feedback  parameters  for  the  existence 
of  a  nonempty  output  constraint  set;  and  b)  a  sufficient  condition  for  the  existence  of 
a  nonempty  robust  invariant  set.  Combined  with  the  convex  parametrization,  this  en¬ 
ables  an  off-line  linear  optimization  to  determine  the  feedback  policy  that  can  tolerate 
much  stronger  disturbances  while  robustly  satisfying  the  constraints. 

Simulation  results  show  that  this  proposed  con¬ 
troller  leads  to  both  a  larger  feasible  set  and  perfor¬ 
mance  improvements  under  the  action  of  strong  dis¬ 
turbances.  For  example,  Fig.  1  shows  a  set  of  initial 
states  from  which  a  controller  has  a  feasible  solu¬ 
tion,  for  three  controllers  2-step  nil,  3-step  nil, 
and  max-dist.  In  the  top  plot  with  (3  =  1.5,  the 
controllers  3-step  nil  and  max-dist  have  a  larger 
initial  feasible  state  set  compared  to  2- step  nil. 

When  the  disturbance  level  (3  =  1.89,  the  controller 
max-dist  has  a  feasible  region  that  is  much  larger 
than  controller  3-step  nil,  as  shown  in  Fig.  1  (bot¬ 
tom).  The  larger  set  allows  for  the  system  to  operate 
in  a  larger  region,  directly  affecting  the  performance 
of  the  controller  [19]. 

We  also  extended  prior  work  to  develop  a  coopera¬ 
tive  form  of  the  distributed  robust  Model  Predictive  Control  that  is  used  for  multi¬ 
vehicle  trajectory  optimization  (called  CDRSBK)  [20].  The  overall  goal  is  to  develop 
an  approach  that  solves  small  subproblems  but  minimizes  a  fleet-level  objective.  In 
this  new  algorithm,  vehicles  solve  their  subproblems  in  sequence,  while  simultaneously 
generating  feasible  perturbations  to  the  decisions  of  the  other  vehicles.  In  order  to 
avoid  reproducing  the  global  optimization,  the  decisions  of  other  vehicles  are  parame¬ 
terized  using  a  much  smaller  number  of  variables  than  in  the  centralized  formulation. 
The  resulting  algorithm  is  shown  to  be  robustly  feasible  under  the  action  of  unknown 
but  bounded  persistent  disturbances  and  monotonically  decreases  the  fleet  objective 
while  cycling  through  the  vehicles  in  the  fleet  and  over  the  time. 

To  demonstrate  the  algorithm,  consider  a  scenario  with  two  vehicles  trying  to  reach 
their  own  targets  while  avoiding  obstacles  and  each  other.  Fig.  2  shows  the  trajecto¬ 
ries  generated  by  the  original  DRSBK  and  cooperative  CDRSBK  algorithms.  Because 
UAV  1  has  to  traverse  a  longer  route,  the  best  solution  is  for  UAV  2  to  “move  over”. 
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Fig.  1  :  Set  of  states  for  which 
a  feasible  solution  exists. 
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Fig.  2:  Trajectories  executed  using  DRSBK  (left);  CDRSBK  (middle  left);  and  time 
history  of  the  objective  function 


While  both  distributed  algorithms  maintain  feasibility  under  the  action  of  distur¬ 
bances,  only  the  CDRSBK  algorithm  naturally  recovers  this  cooperative  behavior. 
Fig.  2  also  shows  the  time  history  of  the  individual  cost  Jp  and  the  fleet  cost  J.  Both 
algorithms  monotonically  decrease  the  fleet  objective,  but  as  shown  in  the  far  right 
figure,  the  cooperative  formulation  allows  the  individual  cost  to  increase  if  it  leads 
to  a  larger  improvement  of  the  fleet  cost.  Between  optimization  14-17,  UAV  2  yields 
to  the  vehicle  with  a  worse  cost  (UAV  1),  enabling  a  large  reduction  in  the  fleet  cost 
J.  The  average  computation  time  for  this  scenario  was  0.050s  for  DRSBK  and  0.064s 
for  CDRSBK.  These  simulation  results  clearly  demonstrate  the  proposed  algorithm 
can  improve  the  fleet  objective  by  temporarily  sacrificing  on  the  individual  objective, 
with  only  a  minimal  impact  on  the  computation  time. 

Multi- vehicle  Testbeds:  We  have  developed  a  unique  indoor  multi-vehicle  test 
facility  called  RAVEN  (Real-time  indoor  Autonomous  Vehicle  test  ENvironment)  to 
study  long-duration  missions  in  a  controlled  environment  [24,  22].  A  key  feature  of 
RAVEN  is  the  Vicon  global  metrology  system.  By  attaching  reflective  balls  to  the 
vehicle’s  structure,  the  Vicon  MX  Camera  system  and  Tarsus  software  can  track  and 
compute  the  vehicle’s  position  and  attitude  information  at  rates  up  to  120  Hz,  with 
a  10  ms  delay,  and  sub-mm  accuracy.  Just  as  GPS  spurred  the  development  of  large- 
scale  UAVs,  we  expect  this  new  sensing  capability  to  have  a  significant  impact  on  3D 
indoor  flight,  which  has  historically  been  restricted  to  very  small  volumes. 

RAVEN  is  an  excellent  rapid  prototyping  en¬ 
vironment  for  UAV  research.  In  addition  to 
demonstrating  advanced  path  planning  con¬ 
cepts  (e.g.,  CDRSBK  test  flights)  [21,  22]  and 
flying  large  UAV  teams  (see  Fig.  3),  it  has  also 
been  used  for  multi-UAV  search  and  track  us¬ 
ing  onboard  vision  [27,  28]  (see  Figs.  4-5).  In 
this  case,  the  high-level  mission  management 
system  ensures  that  an  adequate  number  of  Fig.  3:  10  UAVs  with  1  operator. 
UAVs  are  available  to  search  and  track  targets  in  unknown  locations,  while  a  coop¬ 
erative  algorithm  was  used  to  generate  accurate  estimates  of  the  target  locations  by 
fusing  sensor  data  from  multiple  UAVs  (see  Fig.  5)  [27]. 

Fig.  6  shows  an  example  of  rapid  prototyping  done  on  aggressive  flight  manoeuvres 
using  RAVEN  [26].  The  primary  objective  of  this  work  is  to  design  hybrid  nonlinear 


Fig.  4:  Multi-UAV  search  and  track 


Fig.  5:  Persistent  tracking  of  UGV 


Fig.  6:  Autonomous  aircraft  hover,  transition  to  level  flight  and  back  to  hover. 


controllers  to  execute  very  agile  aerobatics,  similar  to  that  expected  for  future  MAV 
operations.  Fig.  6  shows  a  sequence  that  illustrates  the  current  capabilities  (vertical 
take-off,  hover,  transition  to  horizontal  flight,  tracking  a  tight  circular  path,  transition 
back  to  vertical)  [25,  26].  Videos  are  available  at  http://aerobatics.mit.edu. 
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Abstract — This  paper  presents  a  new  distributed  robust  model 
predictive  control  algorithm  for  multivehicle  trajectory  optimiza¬ 
tion  and  demonstrates  the  approach  with  numerical  simulations 
and  multivehicle  experiments.  The  technique  builds  on  the 
robust-safe-but-knowledgeable  (RSBK)  algorithm,  which  is  devel¬ 
oped  in  this  paper  for  the  multivehicle  case.  RSBK  uses  constraint 
tightening  to  achieve  robustness  to  external  disturbances,  an 
invariant  set  to  ensure  safety  in  the  presence  of  changes  to  the 
environment,  and  a  cost-to-go  function  to  generate  an  intelligent 
trajectory  around  known  obstacles.  The  key  advantage  of  this 
RSBK  algorithm  is  that  it  enables  the  use  of  much  shorter  planning 
horizons  while  still  preserving  the  robust  feasibility  guarantees  of 
previously  proposed  approaches.  The  second  contribution  of  this 
paper  is  a  distributed  version  of  the  RSBK  algorithm,  which  is 
more  suitable  for  real-time  execution.  In  the  distributed  RSBK 
(DRSBK)  algorithm,  each  vehicle  only  optimizes  for  its  own 
decisions  by  solving  a  subproblem  of  reduced  size,  which  results 
in  shorter  computation  times.  Furthermore,  the  algorithm  retains 
the  robust  feasibility  guarantees  of  the  centralized  approach 
while  requiring  that  each  agent  only  have  local  knowledge  of  the 
environment  and  neighbor  vehicles’  plans.  This  new  approach  also 
facilitates  the  use  of  a  significantly  more  general  implementation 
architecture  for  the  distributed  trajectory  optimization,  which 
further  decreases  the  delay  due  to  computation  time. 

Index  Terms — Cooperative  control,  distributed,  invariant  set, 
multivehicle  experiments,  model  predictive  control  (MPC),  robust 
feasibility. 

I.  Introduction 

ODEL  predictive  control  (MPC)  or  receding  horizon 
control  (RHC)  are  natural  techniques  to  approach  to 
trajectory  optimization  problems  for  unmanned  air  vehicles 
(UAVs)  because  they  can  systematically  handle  constraints 
such  as  vehicle  dynamics,  flight  envelope  limitations,  and 
no-fly  zones  [1  ]— [4].  MPC  uses  numerical  optimization  for 
online  replanning,  and  a  model  of  the  system  is  embedded 
within  the  optimization  to  predict  the  future  system  behavior. 
A  key  advantage  is  that  it  can  operate  close  to  the  constraint 
boundaries  and  obtain  better  performance  than  traditional  ap¬ 
proaches  [5]— [7].  Recent  research  has  focused  on  robust  MPC, 
which  is  robust  to  external  disturbances  or  inherent  discrep¬ 
ancies  between  the  model  and  the  real  process,  and  numerous 
techniques  have  been  proposed  in  the  past  decade  [8]-[  15]. 
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When  using  robust  MPC  in  dynamic  environments,  fast  online 
computation  is  needed  in  response  to  new  information.  However, 
the  computation  required  scales  poorly  with  both  the  length  of 
the  trajectories  being  planned  and  the  number  of  vehicles  to  be 
planned  for.  This  paper  addresses  both  of  these  scalability  issues, 
adopting  shorter  horizons  for  scaling  with  length  and  distributed 
computation  for  scalability  with  fleet  size.  The  first  contribu¬ 
tion  of  this  paper  is  the  extension  of  the  constraint  tightening 
robust  MPC  [12],  [16]  to  create  a  robust-safe-but-knowledge- 
able  (RSBK)  algorithm.  This  algorithm  plans  over  only  a  short 
horizon,  terminating  in  a  robust  control  invariant  set  that  needs 
not  to  be  near  the  goal.  A  cost-to-go  function  is  then  used  that 
provides  a  good  estimate  of  the  path  beyond  the  planning 
horizon  to  the  goal  [2].  This  combination  enables  a  much  faster 
computation  than  existing  robustness  methods  [17],  [18]  that 
require  the  target  be  reachable  within  the  planning  horizon, 
which  will  require  long  horizons  for  long  maneuvers. 

For  multivehicle  control,  decentralized  MPC  (DMPC)  [19] 
addresses  the  computational  issue  associated  with  the  central¬ 
ized  optimization  by  breaking  the  optimization  into  smaller  sub¬ 
problems,  with  the  rationale  that  solving  many  small  problems 
is  faster  and  more  scalable  than  solving  one  large  problem.  For 
multivehicle  problems,  it  is  natural  to  divide  the  problem  such 
that  the  plan  for  each  vehicle  is  computed  on-board  that  vehicle, 
i.e.,  such  that  local  decisions  are  made  locally.  Besides  the  com¬ 
putational  advantages  of  DMPC,  this  also  offers  a  reduction  in 
the  amount  of  data  that  needs  to  be  exchanged  between  vehicles, 
and  a  potentially  reduced  level  of  dependency  of  any  individual 
vehicle.  The  challenge  of  decentralized  control  is  to  ensure  that 
distributed  decision  making  leads  to  actions  that  are  consistent 
with  the  actions  of  others  and  satisfy  the  coupled  constraints. 
Various  approaches  have  been  investigated,  including  treating 
the  influence  of  other  subsystems  as  an  unknown  disturbance 
[20],  coupling  penalty  functions  [3],  [4],  [21],  partial  grouping 
of  computations  [22],  loitering  options  for  safety  guarantees 
[23],  and  dynamic  programming  [24],  Some  approaches  involve 
iterative  negotiations  between  subsystems  [21],  [25]  and  apply 
game  theory  to  study  convergence.  Decentralization  is  further 
complicated  when  disturbances  act  on  the  subsystems,  making 
the  prediction  of  future  behavior  uncertain. 

A  second  contribution  of  this  paper  is  to  develop  a  distributed 
form  of  RSBK  (DRSBK).  The  primary  computational  benefit  of 
the  DRSBK  algorithm  over  RSBK  is  that  each  vehicle  only  cal¬ 
culates  its  own  trajectory,  which  is  obtained  by  solving  a  sub¬ 
problem  of  reduced  size.  The  algorithm  creates  a  queueing  order 
of  nonconflicting  groups  of  vehicles,  where  each  group  optimizes 
sequentially,  while  vehicles  within  a  group  solve  their  subprob¬ 
lems  in  parallel.  This  does  not  require  iteration,  which  is  crucial 
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for  a  real-time  implementation  over  a  realistic  communication 
network.  This  paper  also  presents  a  generalization  of  the  imple¬ 
mentation  architecture  for  widely  separated  teams  of  vehicles.  In 
particular,  we  define  a  local  neighborhood  of  each  vehicle  to  be 
all  other  vehicles  that  could  have  a  direct  conflict  with  that  ve¬ 
hicle.  By  limiting  the  number  of  vehicles  considered  to  only  those 
within  a  local  region  of  each  vehicle,  the  number  of  constraints 
in  each  subproblem  can  be  significantly  reduced.  This  modifica¬ 
tion  further  simplifies  the  DRSBK  computation,  but  although  the 
plans  are  only  communicated  locally,  DRSBK  is  shown  to  main¬ 
tain  the  robust  feasibility  of  the  entire  fleet.  This  architecture  gen¬ 
eralizes  the  rigid  implementation  approaches  of  [1 8],  [23]  to  en¬ 
able  some  of  the  vehicles  to  compute  their  plans  simultaneously, 
which  can  significantly  reduce  the  delay  incurred. 

This  paper  is  organized  as  follows.  Following  the  problem 
setup  in  Section  II,  Section  III  presents  the  RSBK  algorithm. 
Section  IV  extends  the  RSBK  algorithm  to  the  distributed  com¬ 
putation  using  only  local  information.  Section  V  shows  several 
simulation  results  and  Section  VI  shows  experimental  results  on 
the  hardware  testbed. 

II.  Problem  Statement 

The  problem  of  interest  has  the  overall  goal  of  reaching  the 
target  while  robustly  maintaining  feasibility.  In  this  paper, 
p,  g,  r  that  are  used  as  an  index  or  superscript  denote  the  vehicle 
number,  subscript  k  denotes  the  current  time  step,  and  subscript 
j  denotes  the  prediction  step.  There  are  total  of  n  vehicles  whose 
dynamics  are  decoupled  and  are  described  by  an  LTT  model 

+  +  K  (1) 

for  p  —  1, . . . ,  n,  where  xrk  is  the  state  vector,  upk  is  the  input 
vector,  and  ufk  is  the  disturbance  vector  for  the  pth  vehicle.  The 
disturbances  wvk  are  unknown  but  are  assumed  to  lie  in  known 
bounded  sets 

vfk  e  Wp.  (2) 

The  environment  has  obstacles  to  be  avoided  and  the  vehicles 
have  flight  envelope  limitations.  The  general  output  sets  yp  cap¬ 
ture  these  local  constraints  of  each  vehicle  p  =  1, . . . ,  n 

Cp3?k  +  Dpupk  =  G  Vp.  (3) 

Vehicles  are  coupled  through  the  constraints  and  a  further  set  of 
constraints  c.  =  1, . . . .  nc  are  applied  to  the  sum  of  the  outputs 
from  each  vehicle 


where  r£  is  a  position  of  the  vehicle  p,  and  2d  is  the  minimum 
separation  distance.  Note  that  each  set  Zc  is  nonconvex  in  this 
case.  Finally,  the  objective  of  the  trajectory  optimization  is  to 
navigate  the  vehicles  to  their  assigned  targets,  and  the  objective 
function  is  the  sum  of  individual  costs 

Xnp  e  Xp  (6) 

n  Nr-1 

P= 1  fc=0 

where  Np  is  the  time  of  arrival  at  vehicle  p’s  target  Xlf  and  is  a 
variable  to  be  minimized,  and  lp  is  a  staged  cost  of  vehicle  p. 

III.  Robust  Safe  But  Knowledgeable  Algorithm 

This  section  presents  a  RSBK  algorithm  [26].  When  applied 
to  multi-vehicle  control,  this  algorithm  solves  a  centralized 
problem,  i.e.,  solving  for  the  plans  of  all  vehicles  p  =  1, . . . ,  rt 
in  a  single  optimization.  Section  IV  discusses  how  this  com¬ 
putation  can  be  separated  into  a  sequence  of  smaller  problems 
and  distributed  across  the  vehicles  in  the  team. 

A.  Algorithm  Description 

Solving  a  single  optimization  ( 1  )-(7)  is  not  tractable  when  the 
vehicle  flies  through  complex  environment  to  a  distant  target, 
because  the  complexity  of  the  optimization  grows  rapidly  with 
the  number  of  steps  Np  required  to  reach  the  target.  Further¬ 
more,  the  situational  awareness  can  change  as  the  vehicle  flies 
and  there  could  be  significant  uncertainties  in  the  far  future.  It  is 
inefficient  to  devote  considerable  computational  effort  to  plans 
for  the  far  future,  since  these  are  likely  to  be  revised  in  the  light 
of  future  learning. 

The  RSBK  algorithm  does  not  require  the  target  arrival  con¬ 
straint  (6)  be  satisfied  in  the  planning  horizon,  allowing  the  con¬ 
troller  to  use  a  short  planning  horizon.  A  cost-to-go  function  is 
used  to  provide  a  good  estimate  of  the  remainder  of  the  path 
to  the  target,  even  in  a  complicated  environment  [2],  In  order 
to  maintain  safety  [27]  of  the  vehicle  under  the  changes  in  the 
environment,  the  trajectory  is  required  to  terminate  in  a  robust 
control  invariant  set.  However,  this  set  need  not  be  at  or  around 
the  target,  as  is  common  in  other  MPC  methods  [8],  [9],  [18]. 
More  detailed  explanation  is  given  later  in  this  subsection. 

In  this  paper,  the  prediction  of  a  value  at  time  ( k+j )  made  at 
time  k  is  denoted  by  the  subscript  (k  +  j  \  k).  The  online  MPC 
optimization  develops  the  control  inputs  for  a  short  horizon  of 
N  steps.  The  optimization  Pjxfc)  at  time  k  is  defined  as 


Vc:  zk,c  =  EcxPk’  VP 

ZXr  €  ^  (4) 

P=  1 

where  zpk  c  denotes  p's  variable  that  is  coupled  with  other  ve¬ 
hicles’  variables.  For  pair-wise  collision  avoidance  constraints, 
each  constraint  c  has  only  two  nonzero  matrices  Ep  and  E'f  and 
enforces  a  minimum  separation  between  that  pair  of  vehicles 

(5) 


E  j  E ''  (<+,  i*.  *J+, ,  *)+/'(<«•, .) 

■i‘  » p=l  1=0  1 

p=l _ «  V  J 

subject  to  Vp  =  1 . n.  and  Vj  =0, _ N  -  1 


(8) 


K|*  =  K 

K+j+i|*  =  APxk+j  I  k  +  BPK+J  I  k 

rt+iik  =  cp3rk+nk  +  iyvr^illreyi 


j  |  k 

Ck.+j  |  k.c 


fc+j  |  k 


Epxp 


c  k+j  |  k 


(9) 

(10) 

(ID 

(12) 


IK  -  rl\\  > 2d 
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^>2  Zk+j  \k,c  ^  Zj,c,  Vc  —  1,.. 

.  ,nc 

(13) 

P= 1 

Xk+N  |  k  ^  Qk 

(14) 

Qp  =  npk  ~  lpn_xw‘ 

(15) 

Vxp  6  npk 

apx?  +  bpkp{xp)  +  lfn_iWp  e  np,  v« >p  e  wp 

Cpxv  +  Dpkp(xp)  g  y^-i 
Vc=l,...,nc:  £p=i  Epxp  €  -Zjv-i.c 

(16) 

The  states  in  (9)  are  the  measured  states  of  vehicle  p.  The 
decision  variables  are  the  control  inputs  up  k  and  the  terminal 
invariant  set  Qpk  that  ensures  the  safety  of  the  vehicle  beyond 
the  planning  horizon.  Note  that  predictions  (10)  are  made  using 
only  the  nominal  system  model,  with  no  disturbance.  In  order 
to  guarantee  robustness  against  disturbances  w,  the  sets  yp  are 
constructed  by  tightening  the  original  set  yp  using  a  linear  con¬ 
troller  Kp  that  rejects  the  disturbance  [17] 

y%=yp,  y-+i=yp  ~  (cp + lfk])  lpwp  vt  (i7) 

where  the  operator  ~  denotes  the  Pontryagin  difference  that  has 
the  property  [28] 

a  €  {A  ~  B),  baB=>a-\-bEA 

and  Lp  is  a  state  transition  matrix 

Lp0  =  I,  Lp+1  =  [Ap  +  BpKp)  Lp  Vr-  08) 

The  notation  Vj~  implies  Vj  =  0, . . . ,  N  -  2,  and  Vj  im¬ 
plies  j  =  0,  —  1.  Equations  (12)  and  (13)  represent  the 

inter-vehicle  constraints  such  as  collision  avoidance,  and  more 
details  on  the  implementation  are  found  in  Appendix  B.  Similar 
tightening  is  performed  on  the  coupling  constraint  sets  in  (13), 
allowing  uncertainty  margin  for  all  subsystems  within  each  con¬ 
straint  (Vc  =  1, . . .  ,nc) 

Zj+i,c  -  Zj.c  ~  {ElrL)W\  ©  •  •  ■  ©  E?L]lWn)  V.r 

Z0,c  -  Ze  (19) 

where  the  operator  ©  denotes  the  Minkowski  summation  [28]. 
Unlike  other  robust  MPC  approaches,  the  constraint  tightening 
approach  does  not  increase  the  complexity  of  the  problem  and 
is  well  suited  for  real-time  applications.  Another  advantage  of 
this  approach  is  that  the  optimization  considers  the  entire  range 
of  vehicle  dynamics  allowed  by  the  constraints  (1 1 )— ( 1 3). 

The  set  Qpk  in  (14)  is  called  a  safety  set,  defined  by  (15).  The 
set  7 Zpk  is  a  robust  control  invariant  admissible  set  [29]  that  has 
a  property  (16).  The  property  states  that  once  the  vehicle  enters 
the  set  lZk.  the  vehicle  can  remain  safe  indefinitely,  satisfying 
all  the  constraints  using  a  predetermined  terminal  control  law 
kp(xp).  The  vehicle  is  safe  also  against  any  changes  in  the  en¬ 
vironment  that  occur  beyond  this  safety  set.  This  terminal  set  Qp 
moves  with  the  vehicle  towards  the  target  and  therefore  a  deci¬ 
sion  variable  in  the  online  optimization,  as  indicated  in  (8).  The 
RSBK  algorithm  parameterizes  the  invariant  set,  and  by  using 


Fig.  1 .  Representation  of  the  cost-to-go  function  showing  the  three  levels  of 
resolution  used  to  approximate  a  complete  path  to  the  goal. 


nilpotent  candidate  controllers,  which  gives  LpN_l  =  0,  it  can 
solve  for  a  simple  nominal  control  invariant  admissible  set  [  16]. 
One  simple  invariant  set  for  fixed-wing  aircraft  is  a  loiter  circle, 
or  for  rotorcraft,  any  point  with  zero  velocity  is  invariant  [27]. 
Detailed  examples  are  given  in  Section  V.  Note  that  vehicle  q's 
safety  set  Qqk  can  overlap  with  vehicle  p's  path  to  its  safety  set 
Qp  without  any  issues.  This  is  because  by  the  time  q  reaches  Qqk, 
vehicle  p  has  already  executed  the  portion  that  overlaps  with  Qk . 

The  function  fp{xk+N  ^  j  in  (8)  represents  the  cost-to-go  be¬ 
yond  the  planning  horizon  and  is  associated  with  the  terminal 
states  of  the  planned  trajectory.  The  function  is  designed  to  pro¬ 
vide  a  reasonable  estimate  of  the  length  of  feasible  paths  to  the 
goal  using  a  sparse  set  of  grid  points,  such  as  obstacle  comers  [2] 
shown  in  Fig.  1  and  vertices  of  other  vehicles’  safety  set.  This 
cost-to-go  is  based  on  the  current  situation  awareness  of  the  en¬ 
vironment,  but  because  the  cost  only  needs  to  be  evaluated  at  a 
small  number  of  points  and  it  uses  a  coarse  model  of  the  aircraft 
dynamics,  it  can  be  updated  rapidly  as  the  environment  changes 
[30].  This  approach  avoids  the  potential  problems  of  simple  ter¬ 
minal  penalties,  such  as  the  distance  to  the  goal,  which  are  not 
cognizant  of  the  known  obstacles  in  the  environment,  and  thus 
can  lead  to  the  aircraft  becoming  trapped  behind  them. 

The  implementation  of  this  cost-to-go  function  has  two  main 
components:  cost  map  generation  and  cost  point  selection.  The 
cost  map  generation  is  based  on  the  observation  that  the  opti¬ 
mized  paths  (i.e.,  minimum  distance)  tend  to  follow  the  edges 
and  comers  of  the  obstacles.  Thus,  a  shortest  path  algorithm  is 
applied  to  a  graph-based  representation  of  the  environment  to 
estimate  the  approximate  cost  cp(rCOTnCT)  to  fly  from  each  ob¬ 
stacle  comer  rcorncr  to  the  target  of  vehicle  p.  Then,  only  the 
pairs  of  comer  locations  and  costs  are  stored  within  the  cost 
map.  The  receding  horizon  optimization  chooses  the  best  comer 
rAs  that  is  visible  from  plan’s  terminal  position  rk+N  |  k  of  the 
vehicle  p,  so  that  the  cost-to-go  function  is 

fP  (aG:  +  /V  |  fc)  =  rfc+A'|  it  ~  ^is  |2  +  (^is)  '  (20) 

Note  that  the  second  term  gives  an  estimate  of  the  cost  from 
the  selected  point  to  the  goal  along  a  collision  free  path.  This 
cost-to-go  function  enables  the  planner  to  use  the  environmental 
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information  beyond  the  detailed  plan,  leading  to  a  knowledge¬ 
able  trajectory.  The  discrete  decision  associated  with  the  comer 
selection  is  implemented  using  mixed  integer  linear  program¬ 
ming  (MILP)  [2].  The  results  presented  in  this  paper  uses  a  2-D 
version  of  the  cost-to-go  calculation,  but  this  has  been  extended 
to  3-D  in  [31]  and  to  account  for  the  typical  turning  behavior  of 
a  UAV  in  [32], 

Given  these  main  components,  the  overall  RSBK  algorithm  is 
summarized  as  follows.  From  the  optimal  solution  at  time  k,  the 
first  control  input  k  for  each  vehicle  is  applied  to  the  system 
(1).  At  the  next  time  k  +  1,  the  states  of  each  vehicle  ar£+1  are 
measured,  and  the  optimization  is  repeated  as  follows. 

1)  Compute  the  following  and  store. 

•  The  constraint  sets  yp  through  (17)  and  (18). 

•  A  cost  map  rcorner,  cp(rcorner)  that  can  be  used  to  eval¬ 
uate  the  cost-to-go  function  fp(  • ). 

2)  At  time  k,  measure  vehicle  states  x3‘k ,  and  formulate  a  MILP 
problem  using  the  stored  values  from  Step  1).  Then,  solve 
the  optimization  problem  P(aii)  shown  with  ( 8)— ( 1 4)  and 
(2°). 

3)  Apply  control  uk  =  uk*  k  from  the  optimal  sequence  to  the 
system  (1). 

4)  Increment  k.  If  the  knowledge  of  the  environment  changes, 
go  to  Step  1);  Otherwise,  go  to  Step  2). 

B.  Properties 

Theorem  3.1  ( Robust  Feasibility):  The  RSBK  algorithm 
maintains  feasibility  of  the  optimization  while  satisfying  all  of 
the  constraints  under  the  action  of  a  bounded  disturbance  (2),  if 
the  first  optimization  is  feasible. 

Proof:  It  can  be  shown  (see  Appendix  A)  that  feasibility  at 
time  k  ensures  that  a  particular  candidate  solution 

“*+j+i  |  *+i  =  “fc+j+i  |  k  +  ^7  (21) 

K+j+i  I  fc+1  =  K+j+1 1  h  +  v7  (22) 

“l+JV  I  k+ 1  =  kP  {^k+N  I  Jb+l)  (23) 

i^k+N+i  |  k+iP  =  ApiiFk+N  |  fc+1  -f  Bpx^.+n  1 1+1  (24) 

is  feasible  at  time  k  + 1,  and  hence  the  optimization  at  time  k  + 1 
must  be  feasible.  ■ 

Remark  1:  In  order  to  recursively  prove  robust  feasibility,  the 
algorithm  requires  the  existence  of  an  initial  feasible  solution. 
Because  the  algorithm  uses  a  short  planning  horizon  and  does 
not  require  the  vehicles  reach  the  goal  in  the  first  plan,  it  is  typ¬ 
ically  very  easy  to  find  an  initial  feasible  solution,  as  will  be 
shown  in  Experimental  Results,  Section  VI.  One  such  initial¬ 
ization  is  a  simple  loiter  pattern,  assuming  the  vehicles  are  far 
enough  apart  compared  to  the  diameter  of  the  loiter  circle.  This 
initialization  is  much  simpler  than  that  required  in  previous  ro¬ 
bust  multivehicle  MPC  algorithms  [16].  This  feature  will  also 
be  exploited  in  the  distributed  form  of  the  algorithm,  where  ini¬ 
tialization  can  be  a  significant  challenge. 

Remark  2:  In  contrast  to  the  nominal  safety  approach  [27] 
that  assumes  no  disturbance  (i.e.,  >VP  =  0),  the  algorithm  pre¬ 
sented  here  never  fails  to  find  a  feasible  solution  under  the  ac¬ 
tion  of  bounded  disturbances.  Furthermore,  the  number  of  con¬ 
trol  variables  is  the  same  as  the  nominal  algorithm.  By  over¬ 


bounding  the  Pontryagin  difference  operation  in  (15),  (17),  and 
(19),  the  algorithm  will  have  the  same  number  of  constraints 
[18]. 

Remark  3:  The  RSBK  algorithm  is  an  anytime  algorithm , 
that  is,  the  optimization  can  be  stopped  at  anytime.  In  such  a 
case,  however,  a  feasible  solution  is  always  available.  This  fol¬ 
lows  because  a  candidate  feasible  solution  can  be  always  con¬ 
structed  from  the  previous  feasible  (not  necessarily  optimal)  so¬ 
lution.  As  shown  in  Appendix  A,  the  calculation  of  a  candidate 
solution  is  simple  and  involves  1 )  shifting  the  previous  plan  by 
one  time  step,  2)  adding  a  disturbance  feedback  sequence,  and 
3)  appending  a  terminal  control  input  using  kp  at  the  terminal 
step  of  the  plan. 

IV.  Distributed  RSBK  Algorithm 

This  section  presents  a  distributed  version  of  the  RSBK  al¬ 
gorithm.  In  this  approach,  each  vehicle  solves  a  reduced  sub¬ 
problem  to  determine  its  control  inputs.  These  optimizations  are 
solved  in  sequence  and  the  distribution  is  achieved  by  having 
each  vehicle  exchange  its  plan  information  with  the  other  vehi¬ 
cles.  A  key  element  of  this  work  is  that  the  vehicles  must  only 
exchange  information  with  its  neighbors,  enabling  the  local  op¬ 
timization  to  be  based  on  local  information  [23],  This  is  im¬ 
portant  because  it  reduces  the  communication  requirements  and 
enables  the  groups  to  replan  faster. 

A.  Algorithm  Description 

The  basic  idea  is  to  include  only  the  vehicles  that  could  have 
direct  conflicts  with  the  vehicle  that  is  planning.  Fig.  2  shows 
an  example  with  three  aircraft.  Any  plan  of  the  vehicle  r  would 
not  have  conflict  with  p’ s  plan  because  they  are  far  apart.  On  the 
other  hand,  the  vehicle  q  could  have  a  conflict  with  p  if  both  p 
and  q  generate  their  plans  independently  and  move  towards  each 
other.  Therefore,  p’s  optimization  must  include  the  intention  of 
q ,  but  the  vehicle  r  could  be  disregarded. 

Before  presenting  the  algorithm,  several  aspects  of  the  nota¬ 
tion  are  defined.  First,  define  vehicle  p's  neighbor  2^  as  an  or¬ 
dered  set  of  vehicles  whose  plans  made  at  time  k  could  have  di¬ 
rect  conflicts  with  p’s  plan  made  at  time  k.  For  the  multivehicle 
collision  avoidance  problem,  the  neighborhood  of  the  vehicle  p 
is  defined  as  all  the  vehicles  within  distance  2 D  from  vehicle 
p’s  position,  where  D  is  the  maximum  plan  length  with  some 
maigin  and  is  given  by 

N 

D  =  B  Umax  ~  0tc)2kt,  +  (1  T  Q^V-l  +  2 p  (25) 

Jt=0 

with  At  being  the  sampling  time  of  the  discrete  time  system, 
d  being  the  size  of  the  vehicle,  p  being  the  radius  of  the  loiter 
circle,  being  the  margin  included  for  robustness  [18],  and 
Pk  being  the  constraint  tightening  margin  for  the  velocity,  whose 
analytical  calculations  are  given  later  in  (40).  The  dashed  line  in 
Fig.  2  shows  the  boundary  of  p's  neighborhood.  The  communi¬ 
cation  range  of  the  vehicles  is  assumed  to  be  larger  than  2D. 
All  obstacles  within  range  D  from  the  vehicle  are  assumed  to 
be  known.  Note  that  the  neighbor  set  is  a  function  of  time  fc,  be¬ 
cause  the  relative  position  of  the  vehicles  will  change  over  time. 
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Fig.  2.  Neighborhood  of  each  vehicle  is  shown  by  the  dashed  lines.  Each  plan 
terminates  in  a  safety  circle. 


The  set  1 %  also  determines  the  order  in  which  the  vehicles 
calculate  their  new  plans  sequentially,  although  Section  IV-E 
modifies  the  assumption  on  this  strict  ordering.  Let  pre(qj  de¬ 
note  the  vehicle  ordered  prior  to  the  vehicle  q,  and  pre(g)  de¬ 
note  the  vehicle  ordered  after  q.  The  first  and  the  last  element  of 
this  set  is  expressed  as  first  and  DrefXPl.  resnectivelv. 
With  the  definition  of  Z£,  we  know  a  priori  that  for  any  two  ve¬ 
hicles  p  and  q  with  q  £  Jk  (and  hence  p  <£  I®),  the  coupling 
avoidance  constraints  are  satisfied  for  all  steps  with  the  plans  of 
the  two  vehicles.  Hence,  even  if  the  subproblem  for  the  vehicle 
p  includes  only  the  plans  of  its  neighbors,  all  of  the  coupling 
avoidance  constraints  will  be  satisfied. 

The  result  of  this  analysis  is  that  only  a  subset  of  all  m  cou¬ 
pling  constraints  need  to  be  considered  in  each  subproblem.  De¬ 
fine  Cvk  C  {1, . . . ,  nc}  as  the  set  of  coupling  constraints  to  be  in¬ 
cluded  in  subproblem  p  at  time  k.  Then,  C[  =  {c6l,...,nc: 
3q  £  XI,  [E%  Eqc\  0}.  This  excludes  two  kinds  of  constraint 
irrelevant  to  p:  those  that  couple  p  to  the  vehicles  outside  its 
neighborhood  and  those  that  do  not  involve  p  at  all,  i.e.,  with 
Epc  =  0. 

Let  Qk  denote  a  vehicle  graph  whose  node  is  a  vehicle  and 
edge  connects  two  nodes  if  the  corresponding  vehicles  are 
neighbors.  If  Qk  is  a  disconnected  graph,  then  Qk  is  divided 
into  a  set  of  connected  subgraphs.  The  information  of  the 
neighbor  sets  1 \  is  shared  by  the  vehicles  in  the  connected 
graph  (or  subgraph  if  Qk  is  not  connected),  so  that  the  vehicles 
in  the  connected  graph  have  the  consistent  information  on  the 
planning  order.  This  can  be  done  using  only  the  inter-vehicle 
communication.  Note  that  the  vehicles  that  belong  to  different 
connected  graphs  do  not  need  to  exchange  information  because 
there  will  be  no  conflict  among  them. 


B.  Algorithm 

At  time  k,  the  pth  vehicle  generates  its  own  control  inputs 
up |  k  by  solving  the  following  optimization  subproblem  Pp(ar£): 


N- 1 


min 

u*(|fc),Q 


fc+j|A:’  “A-+j|  k 


3  =  0 


)+/*(*W)  (26) 


7v  ,  z,V 

* k+j  |  /e,c  ’  l  fc,c 


subject  to  Vj  :  Eq.  (9)— (12),  (14) 

{Apxp  +  Bpkp(xp)  £  Ql 
cpxp  +  dpkp{xp)  e  ypN_i 
Vc  e  K*'  e  Z.v-1, 

V  xp")e {QPo 


(27) 


Vxp  £  Ql 


(28) 


x  ■ 


•xQT}- 


The  term  k  c  is  a  summation  of  the  outputs  from  the 
neighbor  vehicles  and  is  constant  in  this  local  optimization. 
The  term  has  two  components  Vj,  Vc  £  Cl 


*k+j  |  fc,c  " 


E 


zq*  + 

Zk+j  I  k,c^ 


E 


ord(,)<ord(P) 


ord(,)>ord(P) 


zq* 

^  k+j  |  k~  1,C* 


(29) 


The  first  term  is  the  summation  over  the  vehicles  that  have  al¬ 
ready  planned  at  time  k.  The  second  term  is  for  the  vehicles 
that  have  not  planned  at  time  k ,  so  that  the  prediction  made  at 
( k  -  1)  is  used.  This  prediction  comes  directly  from  (12)  in  the 
optimization  Pq(xqk_1).  The  original  coupling  constraint  sets  Zc 
are  modified  in  the  following  manner,  dividing  the  tightening 
process  from  ( 1 9)  into  intermediate  stages  for  each  vehicle: 


Zp:c  -  Z(:  (30a) 

zPcre(9)  =Zl~EqcLqWqVj,  q£ll  q^p0  (30b) 
Z?U,c  =  ZZ  ~  WP„ ,  Vj"  (30c) 


withpo  =  first(2 k)  andp„  =  last(J^).  (30b)  tightens  the 
constraints  from  the  vehicle  q  to  pr e(q).  This  represents  that 
the  vehicle  pr e(q)  saves  some  margin  for  the  vehicle  q  so  that 
q  can  use  it  to  reject  the  disturbances  WT  (30c)  tightens  the 
constraints  from  the  prediction  step  j  to  (j  -I- 1).  This  represents 
that  the  optimization  at  time  k  for  vehicle  pn  saves  some  margin 
so  that  the  optimization  at  time  ( k  +  1)  for  vehicle  p0  can  use  it 
to  also  reject  the  disturbances. 

Note  that  each  vehicle  uses  a  nilpotent  controller  Kp  that 
gives  Lpn_  j  =  0,  and  (15)  is  not  included.  For  the  vehicles  that 
have  already  planned  at  time  k,  the  latest  solution  Qqk  is  used. 
For  the  vehicles  that  have  not  planned  Vg  £  {next(p), . . .  ,pn}, 
the  invariant  set  constructed  at  the  previous  step  is  used,  i.e., 

Ql  =  Ql- v 

The  full  DRSBK  algorithm  is  as  follows. 

1)  Find  a  feasible  solution  of  the  DRSBK  optimization 
starting  from  the  current  states  (see  Remark  4). 

2)  Set  k  =  1. 

3)  For  each  vehicle  p,  update  the  neighbor  set  2£. 

4)  For  each  vehicle  p,  in  a  predetermined  order  (e.g., 
1, . . . ,  n),  do  the  following. 

a)  Gather,  by  communication,  the  latest  plans  zq^k  c  or 
zq*k_1  c  from  its  neighbors  q  £  1%. 

b)  Measure  vehicle  states  a^. 

c)  Construct  a  cost  map  rrorner,  cp(rcorner). 

d)  Solve  subproblem  Pp(x£). 

5)  Apply  control  k  to  each  vehicle  p. 

6)  Increment  k  k  +  1  and  go  to  3. 

Note  that  this  algorithm  is  also  a  generalization  of  the  two  pre¬ 
viously  published  distributed  MPC  algorithms  [  1 8],  [23],  in  that 
it  includes  both  robustness  and  a  short  plan  that  does  not  neces¬ 
sarily  reach  the  target. 
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The  steps  1,  3,  and  4a-c  are  implemented  in  Matlab.  Be¬ 
fore  solving  each  subproblem  4d,  Matlab  forms  the  MILP 
constraints  using  both  the  static  parameters  such  as  vehicle  dy¬ 
namics  limit,  target  location,  and  constraint  tightening  margin, 
and  the  dynamically  updated  parameters  such  as  current  vehicle 
states,  obstacle  boundaries,  and  other  vehicles’  plans.  More  de¬ 
tails  on  the  MILP  implementation  are  shown  in  Appendix  B, 
Then,  the  MILP  solver  CPLEX  is  invoked  in  step  4d.  The  opti¬ 
mized  control  states  and  inputs  are  extracted  from  CPLEX  into 
MATLAB  and  is  sent  to  the  vehicle  in  step  5.  Note  that  in  the 
hardware  experiment  Section  VI,  Matlab  receives  the  mea¬ 
sured  states  from  the  vehicle  in  step  4b. 

C.  Robust  Feasibility 

Even  though  each  subproblem  only  uses  local  information, 
the  robust  feasibility  of  the  entire  fleet  can  be  proven  using  an 
approach  that  parallels  [17]. 

Theorem  4.1:  If  feasible  solutions  to  all  subproblems 
P1(x^), . . . ,  Pn(ar£)  can  be  found  at  time  k,  then  the  subprob¬ 
lems  at  the  future  times  t  >  k  are  feasible  under  the  action  of 
disturbances. 

Proof:  The  proof  is  based  on  a  recursion  and  similar  to  the 
proof  of  Theorem  3. 1  in  Appendix  A.  Without  loss  of  generality, 
the  planning  order  is  assumed  to  be  1, 2, . . . ,  n. 

1)  Assume  all  the  subproblems  Pp(x£)  have  a  feasible  solu¬ 
tion  at  time  k. 

2)  Then,  it  can  be  shown  that  a  feasible  solution  exists  to  the 
first  subproblem  P1(a:^+1)  at  time  k  +  1  for  all  distur¬ 
bances  w\  acting  on  the  vehicle  1  despite  the  change  in 
the  neighbor  set  1\+v  This  is  done  by  showing  that  the 
following  candidate  solution  is  feasible 

“fc+j+i 1  fc+1  =  “fc-hj+1 1  k  +  K)L)w  l*  V j-  (31) 

*l+j+ 1 1  k+ 1  =  4+y+i  |  k  +  L)™\  V?  (32) 

“fc+AJ|it+l  =  (^fc+A^fc+l)  (33) 

^fc+W+1  |  fc+1  —  A1!  )k+N  I  fc+1  +  ^“fc+IV  I  fc+1-  (34) 

3)  Under  the  assumption  in  Step  1,  it  can  be  shown 
that  given  any  solution  to  the  problem  Pp(x^!  +  1 )  for 
p  £  {l,...,n  —  1},  the  next  subproblem  Pp+1(x£*j) 
is  feasible,  by  showing  the  feasibility  of  a  candidate 
sequence.  Similar  to  (3 1 ) — (34)  in  Step  2,  the  candidate 
solution  is  constructed  by  shifting  the  previous  plan  for 
vehicle  p  +  1,  assumed  known  in  Step  1,  by  one  time  step 
and  adding  a  perturbation  sequence  using  the  predeter¬ 
mined  controller  Kp+  . 

Therefore,  at  k  +  1,  all  subproblems  P1  ,  Pn(sJ.*+1) 

are  feasible.  ■ 

D.  Remarks 

Remark  4:  Simple  Initialization:  Initializing  this  algorithm 
requires  the  other  vehicles’  previous  solution,  as  shown  in  (28) 
and  (29).  However,  a  simple  initialization  technique  such  as 
loiter  circle  can  be  used,  as  discussed  in  Remark  1  of  the  RSBK 
algorithm. 

Remark  5:  Scalability.  If  each  subproblem  includes  the  in¬ 
teractions  with  all  the  other  vehicles,  as  in  [  1 8],  the  number  of 


Fig.  3.  Output  of  Brelaz’s  heuristic  algorithm  for  vertex  coloring.  Each  node 
represents  a  vehicle,  while  each  line  connecting  two  nodes  represents  that  they 
are  neighborhood.  The  number  is  a  group  label  for  the  vehicle  and  vehicles  with 
the  same  group  label  can  compute  simultaneously. 

constraints  grows  rapidly  with  the  size  of  the  fleet,  which  would 
increase  the  problem  complexity.  The  algorithm  presented  here 
only  requires  the  information  about  its  neighbors,  resulting  in  a 
more  scalable  approach.  Furthermore,  each  vehicle  only  needs 
the  information  from  its  neighbors,  so  that  the  algorithm  re¬ 
quires  much  less  communication  bandwidth. 

E.  Simultaneous  Computation 

This  section  removes  the  assumption  on  the  strict  ordering 
and  enables  simultaneous  computation  among  vehicles. 

Theorem  4.2:  Two  vehicles  p  and  q  can  generate  trajectories 
simultaneously  without  causing  infeasibility  in  the  algorithm  if 
p  £  Iq  (and  hence  q  f  Xv). 

Proof:  By  the  definition  of  neighbor  Xv  and  X",  the  plans 
for  p  and  q  have  no  conflict.  Given  an  arbitrary  vehicle  r(f 
p ,  q),  both  optimizations  by  p  and  q  ensure  that  the  same  can¬ 
didate  plan  similar  to  (3 1)— (34)  for  each  vehicle  r  is  feasible. 
Thus,  when  p  and  q  calculate  simultaneously,  the  vehicle  r  has 
a  feasible  solution  at  the  next  optimization.  ■ 

By  applying  this  theorem  to  pairs  of  vehicles  in  the  fleet,  it  can 
be  shown  that  more  than  two  vehicles  can  perform  optimization 
simultaneously.  The  vehicles  that  compute  simultaneously  are 
grouped  together,  and  the  number  of  vehicles  that  compute  si¬ 
multaneously  is  to  be  maximized.  This  grouping  problem  is  cast 
as  a  vertex  coloring  problem  on  the  vehicle  graph  Qk,  where 
each  vertex  represents  a  vehicle  and  vertices  are  connected  if 
they  are  neighbors.  The  goal  is  to  color  all  the  vertices  with  a 
minimum  number  of  colors  while  using  different  colors  for  ad¬ 
jacent  vertices.  Brelaz’s  heuristic  algorithm  [33]  is  used  here 
because  it  provides  good  solutions  very  rapidly.  Vehicles  of  the 
same  color  are  in  one  group  and  can  compute  their  solutions  si¬ 
multaneously. 

Fig.  3  shows  a  simple  example  where  Brelaz’s  algorithm  is 
applied  to  a  graph  of  ten  vehicles.  Note  that  in  order  to  color  the 
vehicles,  the  location  of  all  the  vehicles  in  the  connected  graph 
must  be  known.  A  central  ground  station  can  be  introduced  to 
run  the  grouping  algorithm  and  determine  the  planning  order. 
Alternatively,  the  vehicles  can  obtain  this  information  by  com¬ 
municating  only  locally  through  neighbors.  Then,  Steps  3)  and 

4)  of  the  DRSBK  algorithm  in  Section  IV-B  are  modified  to  the 
following. 

3)  Ground  station  receives  vehicle  positions  r£,  runs  the 
grouping  algorithm,  and  determines  the  planning  order. 
Each  vehicle  updates  the  neighbor  set  X£. 

4)  For  each  group,  do  the  following  simultaneously  for  all 
vehicles  p’s  in  the  group. 
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V.  Simulation  Results 


A.  Vehicle  Model 

A  point-mass  dynamics  model  is  used  to  approximate  trans¬ 
lational  dynamics  of  rotorcraft  and  fixed-wing  aircraft 


’**+1' 

-  Ap 

1^1 

A+i. 

kJ 

+  Bpap  +  Wpk 


with  AF  — 


Bp  = 
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02 


A  tl2 
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L  Af/2  J 


where  rp,vp,  and  ap  are  the  position,  the  velocity,  and  the  ac¬ 
celeration  vector,  respectively.  Matrices  /2  and  O 2  express  an 
identity  matrix  and  a  zero  matrix  of  size  2,  respectively.  The  dis¬ 
turbance  enters  through  the  input  acceleration  and 


€  W={w\w  =  Bpn,n  <=  7?2,  ||n||oo  <  wmax}.  (35) 


The  local  constraints  include  the  obstacle  avoidance,  the  max¬ 
imum/minimum  speed,  and  the  maximum  input  constraints 


^min  ^  KII2  —  ^max 
1 1  1 1 2  —  ®max 

where  O  C  TZ2  expresses  the  no-fly  zones,  and  umln,  i;max, 
Umax  are  the  minimum  speed,  maximum  speed,  and  maximum 
acceleration  of  the  vehicle.  A  two-step  nilpotent  controller  Kp 
for  this  system  is  Kp  =  [-(l)/(Af2)/2, -(3)/(2Af)J2],  which 
enables  the  use  of  nominal  invariant  set  as  a  safety  set.  Per¬ 
forming  constraint  tightening  ( 1 7)  gives  the  following  constraint 
set  [181: 


Fmin  +  ftj  ^ 


VP 

K+j  |  k 


5:  3Imax  ftj 


k+j  |  k 


f  Umax  7j 


(36) 

(37) 

(38) 


where  constraint  contraction  parameters  a,  ft,  and  7  are  defined 
in  (40)  in  Appendix  B.  The  set  B  represents  a  2-D  unit  box,  i.e., 
B  =  {x  €  TZ2  |  HU  <  1}.  Note  that  (36)  expands  the  no-fly 
zones  to  guarantee  robust  feasibility.  The  cost  map  calculation  is 
based  on  the  expanded  obstacles  O  ®  B.  The  inter- vehicle 
avoidance  constraints  in  p’s  optimization  are  written  for  each 
q  £  as 


B.  Long  Trajectory  Generation  for  Single  Vehicle 

The  first  example  demonstrates  that  the  RSBK  algorithm 
for  a  single  vehicle  can  design  a  very  long  trajectory  without 
computational  issues.  In  this  example,  the  rotorcraft  is  used, 
which  does  not  have  the  minimum  speed  constraint.  The  ve¬ 
hicle  parameters  are:  Af  =  2.6  s,  N  =  6,  vmax  —  0.5  m/s,  and 
amax  —  0.17  m/s2.  The  invariance  of  the  set  Qk  is  guaranteed 
by  imposing  the  following  hovering  safety  constraints  in  the 
optimization 


Xk+N  +  l  |  fc  =  &k+N\k 
rk+N\k  $  O  ®  Oin-iB. 


The  second  equation  ensures  that  the  hovering  location  is  col¬ 
lision  free.  The  target  region  is  far  from  the  initial  vehicle,  and 
to  solve  this  problem  by  planning  all  the  way  to  the  goal  would 
require  a  horizon  of  at  least  30  steps.  This  would  require  im¬ 
practical  computational  effort  for  real-time  applications  and,  as 
discussed  earlier,  would  be  inefficient  due  to  the  uncertainty  in 
the  far  future. 

Fig.  4  shows  trajectories  generated  under  three  different  dis¬ 
turbance  levels  ?/>max  =  0, 0.1amax,  0.2amax.  In  all  cases,  the 
RSBK  algorithm  guided  the  vehicle  to  the  target,  and  the  av¬ 
erage  computation  time  was  less  than  0.2  s.  When  the  distur- 

Konop  Ip»vp»1  ic  1  nQk  rtf  fhp  pnntrnl  tiiifhrtrifu  Iroiprfnn/  ic  ci m_ 
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ilar  to  the  one  with  no  disturbance.  However,  when  the  distur¬ 
bance  level  is  raised  to  20%  of  the  control  authority,  the  vehicle 
takes  a  different  route  because  the  passage  in  the  middle  of  the 
figure  used  by  the  other  plans  is  too  narrow  to  pass  through  ro¬ 
bustly.  A  cost-to-go  calculation  based  on  the  robustified  envi¬ 
ronment  O  ©  ojv-i®  does  not  allow  the  vehicle  to  enter  the 
narrow  passage  where  the  vehicle  could  violate  the  collision 
avoidance  constraints  due  to  a  strong  disturbance. 

Note  that  the  vehicle  moves  slowly  when  the  disturbance  is 
strong,  as  it  is  expected  intuitively.  Because  more  margin  must 
be  saved  to  reject  a  stronger  disturbance,  less  control  authority 
can  be  used  when  generating  the  trajectory.  The  hovering  state 
used  as  a  terminal  invariant  set  requires  the  vehicle  be  able  to 
stop  at  the  end  of  each  plan  using  the  small  control  authority 
available  in  the  prediction.  Table  I  summarizes  this  result.  The 
average  speed  becomes  significantly  smaller  when  the  distur¬ 
bance  level  is  increased  from  10%  to  20%.  The  number  of  steps 
it  takes  to  reach  the  target  set  is  significantly  longer  with  the  20% 
disturbance  level,  partly  because  of  the  longer  route  it  chooses, 
but  mainly  due  to  the  reduced  speed. 


-  rq * 

k+j  |  k  'k+j  |  k 
v  q* 

’k+j]k  Tk+j\k-l 


>2 d  +  2a j,  if  ord(g)  <  ord(p) 

>  2d  T  Oj  -f  Oj+i, 

if  ord(<7)  >  ord(p) 


where  r7*  are  sent  from  p's  neighbors.  The  terminal  safety  sets 
must  not  overlap  with  each  other,  as  shown  in  (28),  so  that 
the  sets  Qqk  (Vq  f  p)  are  treated  as  no-fly  zones  after  time  step 
k  +  N  —  1  in  the  optimization  P;’(xj!).  These  nonconvex  avoid¬ 
ance  constraints  are  implemented  using  MILP.  More  details  are 
found  in  Appendix  B. 


C.  Multi-UAV  Scenarios 

The  second  set  of  simulations  used  homogeneous  fixed-wing 
UAVs.  The  maneuver  limit  of  the  vehicle  is  given  by  ?;min  = 
I8tn/s,(;max  =  24  m/s,  amax  =3. 84  m/s2.  The  disturbance  mag¬ 
nitude  w;max  is  5%  of  the  control  authority  amax.  The  planning 
horizon  length  N  is  5.  Fixed-wing  UAVs  have  minimum  speed 
limit,  and  a  safety  loitering  circle  is  used  as  a  terminal  invariant 
set  [23].  For  simplicity,  in  this  section,  the  simultaneous  com¬ 
putation  is  implemented  as  a  sequential  computation  on  a  single 
computer  but  in  the  same  simulation  time  step.  Section  VI  shows 
the  real-time  experimental  results  using  multiple  computers. 
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Fig.  4.  Trajectories  generated  by  the  RSBK  algorithm.  The  vehicle  starts  at  the 
right,  and  the  goal  is  marked  with  o.  The  obstacles  are  expanded  in  the  RSBK 
calculation  to  account  for  the  avoidance  check  imposed  only  at  discrete  time 
steps,  (a)  No  disturbance,  (b)  Disturbance  level  10%.  (c)  Disturbance  level  20%, 


TABLE  I 

PERFORMANCE.  COMPARISON  FOR  THREE  DISTURBANCE  LEVELS 


(b) 


(c) 


Fig.  5.  Four  vehicle  scenario.  The  goal  points  are  shown  with  □  with  the  cor¬ 
responding  vehicle  index,  (a)  Trajectories,  (b)  Snapshot  at  time  5.  Note  that 
squares  containing  safety  circles  do  not  overlap  with  each  other,  (c)  Snapshot  at 
time  8,  showing  the  successful  avoidance  maneuvers. 


Disturbance 

level 

Average 

speed 

Steps 

0  % 

0.50  m/s 

26 

10  % 

0.44  m/s 

30 

20% 

0.28  m/s 

48 

The  DRSBK  algorithm  was  tested  in  the  following  two  sce¬ 
narios.  The  first  scenario  uses  four  vehicles  with  vehicle  avoid¬ 
ance  constraints.  Fig.  5(a)  shows  the  entire  trajectories.  Goals 


are  marked  with  □  together  with  the  corresponding  vehicle  in¬ 
dices.  Fig.  5(b)  shows  the  plans  made  at  time  k  =  5.  The  rec¬ 
tangle  in  dashed  lines  shows  a  safety  region  where  the  safety 
circle  is  contained  and  the  other  vehicles  cannot  enter  after  time 
k  +  N.  Note  that  the  plan  of  the  vehicle  4  (marked  with  a  star) 
aims  for  the  comer  (marked  with  a  o)  of  this  rectangle  of  the 
vehicle  I  because  this  comer  is  in  the  cost  map.  As  shown  in 
Fig.  5(b)  and  (c),  it  is  acceptable  for  the  plan  of  one  vehicle  to 
pass  through  the  safety  region  for  another.  The  terminal  set  (28) 
only  requires  that  the  safety  regions  do  not  overlap  each  other. 
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Fig.  6.  Ten  vehicle  scenario  with  four  obstacles,  (a)  Trajectories  of  all  the  ve¬ 
hicles.  (b)  Graph  representation  of  neighborhood  at  time  1 4.  (c)  Time  history  of 
the  vehicle  grouping. 


The  second  scenario  is  much  more  complicated  and  involves 
ten  vehicles  and  four  obstacles.  Fig.  6(a)  shows  the  trajectories 
of  all  ten  vehicles.  Although  computation  was  done  on  one  pro¬ 
cessor  in  this  section,  the  grouping  algorithm  was  included  to 
investigate  the  potential  for  speed-up  by  simultaneous  compu¬ 
tation.  The  hardware  experiments  presented  in  Section  VI  use 
one  processor  for  each  vehicle.  Fig.  6(b)  shows  a  snapshot  of 
the  vehicle  locations  (marked  with  a  o)  at  time  k  =  14.  The 
neighbors  are  connected  by  the  lines  and  each  vehicle  is  labeled 
with  a  color/group  number.  Note  that  no  two  vehicles  connected 


TABLE  II 

Average  Computation  Time  (s)  of  Each  Subproblem 


Scenario 

Cost  map 
calculation 

Optimization 

(MILP) 

4  veh 

0.04 

0.21 

10  veh  (local  comm.) 

0.21 

0.25 

10  veh  (full  comm.) 

0.21 

0.37 

to  each  other  have  the  same  group  number.  The  vehicles  in  the 
same  group  can  simultaneously  solve  their  optimization  without 
any  conflict  in  their  trajectories.  Fig.  6(c)  shows  the  time  history 
of  the  number  of  colors  required  for  grouping  the  vehicles.  The 
number  of  groups  is  low  when  the  vehicles  are  far  apart,  but  as 
might  be  expected,  this  increases  to  six  in  the  middle  of  the  mis¬ 
sion  when  the  vehicles  are  in  close  proximity. 

Table  II  shows  the  average  computation  time  for  these  sce¬ 
narios.  The  cost  map  calculation  was  done  in  MatlaB,  and  the 
MILP  optimization  was  solved  using  CPLEX  9.0  on  a  Pentium 
IV  3.2-GHz  machine  with  1  GB  of  RAM.  The  computation  time 
of  the  cost  map  calculation  grows  with  the  number  of  vehicles 
because  a  higher  number  of  loiter  circles  means  that  more  ob¬ 
stacles  must  be  considered.  In  order  to  demonstrate  the  effect 
of  using  only  the  local  information,  the  ten-vehicle  scenario  is 
tested  also  with  a  case  where  each  vehicle  includes  all  other 
vehicles  as  neighbors  with  full  communication.  The  last  two 
rows  of  Table  II  illustrate  that  DRSBK  with  local  communica¬ 
tion  solves  the  problem  much  faster.  The  output  of  the  grouping 
algorithm  is  used  to  enable  simultaneous  computation,  and  the 
number  of  groups  that  must  compute  sequentially  is  ten  in  the 
full  communication  case,  as  opposed  to  six  in  the  local  commu¬ 
nication  case.  This  indicates  the  local  communication  architec¬ 
ture  reduces  the  fleet  computation  time  further  by  40%  in  this 
scenario. 

VI.  Experimental  Results 

This  section  presents  experimental  results  of  the  DRSBK  al¬ 
gorithm  on  the  multivehicle  testbed.  The  hardware  demonstra¬ 
tions  introduce  realistic  features  such  as  computation  and  com¬ 
munication  time  delays  and  prediction  errors  that  naturally  arise 
from  the  various  sources  of  uncertainty  in  the  system,  including 
the  tracking  errors  from  the  low-level  waypoint  follower  and 
modeling  errors  of  the  vehicle  dynamics.  These  implementation 
challenges  must  be  addressed  by  the  algorithm  in  order  to  suc¬ 
cessfully  generate  trajectories  online. 

A.  Testbed  Setup 

Fig.  7  shows  the  testbed  setup  with  the  indoor  positioning 
system  from  ArcSecond  Constellation  3-D-i  and  Pioneer  3-AT 
from  ActivMedia  Robotics.  In  order  to  demonstrate  the  online 
distributed  computation  amongst  the  vehicles  in  the  fleet,  each 
rover  has  two  laptops,  as  shown  in  Fig.  7(a).  A  small  “control” 
laptop  performs  the  navigation  and  low-level  vehicle  control 
tasks,  and  a  2.4-GHz  “planning”  laptop  performs  the  DRSBK 
computation  using  a  combination  of  Matlab  and  CPLEX. 

The  control  laptop  runs  an  estimator  for  the  position  and  the 
velocity  estimate  of  the  vehicle.  For  practical  implementation, 
instead  of  applying  the  acceleration  command  m*  as  in  Algo¬ 
rithm  step  5,  the  onboard  planner  sends  the  optimized  trajectory 
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Fig.  7.  Multirover  testbed,  (a)  Photos  of  the  hardware,  (b)  Testbed  architecture. 


Fig.  8.  Timing  of  the  DRSBK  algorithm  on  the  testbed. 


to  the  control  laptop,  which  generates  wheel  speed  commands 
lor  the  rover.  A  nonlinear  guidance  law  [34]  is  used  to  imple¬ 
ment  a  trajectory  tracking  controller,  which  runs  at  a  faster  rate 
than  the  DRSBK  controller.  This  represents  an  apportionment 
of  uncertainty  in  the  problem,  with  the  low-level  handling  fast 
dynamics  and  the  high-level  handling  uncertainty  in  the  envi¬ 
ronment,  collision  avoidance,  and  residual  tracking  errors.  The 
ground  station  laptop  runs  a  grouping  algorithm  at  each  time 
step,  but  all  DRSBK  calculations  are  done  onboard,  as  shown 
in  Fig.  8.  Each  planning  laptop  communicates  its  local  solu¬ 
tion  with  its  neighbors  using  the  802. 1 1  a  wireless  LAN.  For  this 
testbed,  the  inter-vehicle  communication  is  facilitated  using  an 
access  point  connected  through  an  Ethernet  cable  to  the  ground 
station  laptop. 

Fig.  8  shows  the  timing  of  the  experimental  setup.  This  ex¬ 
ample  has  three  vehicles  in  two  groups  where  the  vehicles  1  and 
2  compute  simultaneously.  The  control  input  of  each  vehicle  is 
implemented  using  fixed  discrete  time  steps.  The  planner  takes 
a  measurement  (I)  and  propagates  forward  the  measured  states 


using  the  nominal  model  to  predict  the  initial  states  x*,  of  the 
plan.  This  propagation  compensates  for  the  system  delay  that 
results  from  the  computation  time  /comp,  the  communication 
delay  /comm,  and  the  actuation  delay  /delay •  It  then  computes 
the  optimal  control  input  (II)  and  waits  until  the  control  update 
time  (III).  The  step  size  A t  between  time  step  k  and  k  +  l  was 
2.8  s  for  two-rover  cases  and  3.5  s  for  three-rover  cases. 

A  typical  experimental  run  starts  by  commanding  the  vehicles 
to  drive  straight  in  the  initial  heading  direction.  After  1 .5  s,  the 
first  vehicle  takes  its  measurement  and  the  DRSBK  loop  starts. 
For  other  vehicles  that  have  not  made  any  plans,  loiter  circles 
starting  from  their  current  states  are  used  as  their  initial  feasible 
plan,  as  mentioned  in  Remark  4.  This  demonstrates  the  online 
initialization  capability  of  this  algorithm. 

Given  the  applications  of  interest  are  multi-UAV  coordination 
problems,  the  rovers  have  been  modified  to  emulate  the  motion 
of  a  UAV  in  2-D.  In  particular,  the  vehicles  are  constrained  to 
a  maximum  speed  ?;max  —  0.25  m/s,  a  minimum  speed  vmi„  — 
0.044  m/s,  and  a  minimum  turning  radius  rmin  =  0.9  m.  The 
vehicle  size  is  d  =  0.25  m.  The  planning  horizon  length  is  three 
steps.  The  disturbance  v?k  is  assumed  to  enter  into  position  and 
velocity  separately 

ufkeW  =  {wen4\  ||[J2,02]u>||2  < 

||[02, 12MI2  ^  wv„,»x}  • 

Extensive  testing  of  the  vehicle  on  different  types  of  flooring 
indicated  that  the  prediction  errors  due  to  the  uncertain  vehicle 
dynamics,  navigation  errors  and  external  disturbances  are  ap¬ 
proximately  uvmax  =  15  cm  and  wVmxx  —  5  cm/s.  Due  to  the 
tightened  constraints,  the  speed  is  constrained  to  be  0.14  m/s 
<  v  <0.15  m/s  after  N  =  3  steps. 

B.  Results 

Scenarios  are  constructed  to  highlight  several  features  of 
DRSBK  algorithm:  on-board  laptops  generate  trajectories  on¬ 
line,  which  shows  the  computational  advantages  for  real-time 


Fig.  9.  Two  vehicle  experiment  results.  The  arrows  show  the  initial  heading  directions  of  the  vehicles.  The  goals  are  marked  with  a  □.  (a)  Run  #1.  (b)  Run  #2. 
(cl  Run  #3.  (d)  Run  #4. 


applications;  the  vehicles  are  required  to  maneuver  in  a  con¬ 
strained  environment,  which  demonstrates  the  robust  feasibility 
under  the  action  of  disturbances;  plans  based  on  distributed  com¬ 
putation  can  satisfy  the  coupling  collision  avoidance  constraints. 

Test  1:  The  first  experiments  were  designed  to  test  the  ob¬ 
stacle  and  vehicle  avoidance  using  two  rovers.  During  the  first 
few  steps  in  each  run,  the  separation  between  the  two  vehicles 
was  more  than  2-D  =  6.34  m,  and  the  onboard  computers  op¬ 
timize  trajectories  simultaneously.  However,  as  they  move  to¬ 
wards  each  other,  the  planning  horizons  overlap,  and  they  must 
compute  the  solutions  sequentially.  For  the  purposes  of  these 
demonstration,  the  experiment  is  terminated  once  the  vehicle 
avoidance  and  obstacle  avoidance  maneuvers  are  completed. 
Fig.  9  shows  four  runs  performed  on  this  testbed.  DRSBK  algo¬ 
rithm  maintained  feasibility  under  the  action  of  the  disturbances, 
and  all  runs  show  the  robust  vehicle  avoidance  and  obstacle 
avoidance  based  on  the  online  distributed  trajectory  generation. 

Test  2:  The  second  set  of  runs  examines  vehicle  avoidance 
maneuvers  using  three  rovers  that  are  forced  to  execute  a 
crossing  pattern.  Fig.  10  presents  the  executed  trajectories  for 
three  runs  with  different  initial  locations  and  headings.  Note 
that  the  resolution  strategies  differ  with  the  scenario.  One  of 
the  key  features  of  MILP  is  that  it  handles  the  nonconvexity 
directly  and  looks  for  solutions  on  all  sides  of  obstacles  and 


conflicts.  This  example  illustrates  that  DRSBK  is  making  use 
of  this  functionality,  as  opposed  to  other  methods  that  could 
simply  refine  the  initial  guess  that  are  given.  In  these  scenarios, 
vehicles  1  and  3  are  initially  neighbors  because  they  are  closer 
than  2D  =  6.54  m,  and  thus  compute  sequentially.  Vehicle  2 
is  initially  independent  of  that  pair,  and  thus  solves  for  its  plan 
simultaneously  with  one  of  them.  However,  since  the  vehicles 
are  crossing,  vehicle  2  joins  the  pair  after  a  few  time  steps,  and 
then  all  three  vehicles  compute  sequentially.  Once  the  vehicles 
finish  the  avoidance  maneuver  near  the  middle  of  the  figure,  the 
group  breaks  up  as  the  vehicles  move  apart  and  starts  solving 
for  the  plans  simultaneously  again.  The  results  demonstrate 
online  dynamic  grouping  and  regrouping  of  the  vehicles  using 
the  algorithm  in  Section  IV-E. 

VII.  Conclusion 

This  paper  presented  a  new  distributed  robust  MPC  algorithm 
for  multivehicle  trajectory  optimization.  The  approach  extends 
previous  results  to  ensure  robust  feasibility  without  having  to 
plan  all  of  the  way  to  the  goal  and  with  only  communicating  the 
plans  within  a  local  neighborhood  rather  than  the  entire  fleet. 
This  two  new  features  greatly  reduce  the  computation  effort  and 
facilitate  a  significantly  more  general  implementation  architec¬ 
ture  for  the  distributed  trajectory  optimization.  Experimental  re- 
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Fig.  10.  Three  vehicle  experiment  results.  The  arrows  show  the  initial  heading 
directions  of  the  vehicles.  The  goals  are  marked  with  □.  (a)  Run  #1 .  (b)  Run  #2. 
(c)  Run  #3. 


suits  on  a  multivehicle  testbed  demonstrate  many  advantages  of 
this  algorithm  including  online  distributed  optimization,  simul¬ 
taneous  computation,  and  the  robust  feasibility  against  the  dis¬ 
turbances  in  the  real  environment. 


Appendix 


A.  Proof  of  Theorem  3. 1 

The  proof  of  Theorem  3.1  is  based  on  a  recursion  [17].  Once 
a  feasible  solution  up’k.x'^h  to  the  problem  P(xfc)  is  obtained 


at  time  k,  a  candidate  solution  to  P(x*+i)  at  time  k  +  1  is  con¬ 
structed  from  (21)-(24).  Note  that  the  disturbance  realization 
vfk  at  time  k  is  available  at  time  (k  +  1).  Using  that  the  fea¬ 
sible  solution  t Lp*k,xp*\k  satisfies  constraints  ( 1 0)— ( 1 4),  it  can 
be  shown  that 


AVtfc+j  +  \  I  fc+1  +  BPK+j+i  |  k+1 

=  AV/+i+1 ,  t  +  5V/+J+1 1  fc  +  (AP  + 

—  ^k+j+2\k  +  l'  Vj 

satisfying  (10)  at  k  +  1.  Similarly,  (1 1)  at  time  k  +  1  is 

^  ^k+j+1  |  k+l  +  D'“J«+.  |'fc+l 

=  c<+i+i  I  k  +  ^T+i+i  |  k  +  {CF  +  I?K>)LW 

ey*,  v.r 

since  it  can  be  shown  using  (17)  that 
vT+j+Uk  €  yP+v 

It  can  be  shown  using  (19)  that  the  coupling  constraints  (13)  at 
time  k  +  1  is  satisfied 

|  SP=i  Ec*k+j+ i  |  /t+i  =  SP=i  (EW'+j+llk+LW) 

)  1^P=  1  £‘cxk+j+ 1 1  k  fc  + 

€  Wp 

n 

|  fc+i  ^ 

P=i 

For  the  terminal  constraints  (14)  at  k  +  1 


qi=k~lpn-iWp 

^■'ric+Af  1  it+l  'H 

^*l+iV+i  I  it+1  +  l*47  €  Vte  €  Wp 

^^fc+TV+l  |  fc+1  e  Ql- 

Last,  the  following  shows  the  output  constraint  for  the  terminal 
step  is  satisfied: 

^k+N  |  A:  4-1  S  ^7b 
^  tfc+N  +  l  |  k+l  =  GP^k  +  N  |  fc+1 

+  DP^P{K+N\k+l)^yPN-V 

Therefore,  the  candidate  solution  satisfies,  under  the  bounded 
disturbance  w’k  G  Wp.  all  the  output  constraints  y1-  and 
coupling  constraints  Zhc  at  time  k+l,  and  the  terminal  state 
ir^+A,+i  I  fc+i  lies  in  the  set  Qpk.  Thus,  feasibility  at  time  k 
guarantees  feasibility  at  time  k  +  l  under  the  action  of  bounded 
disturbance.  If  the  first  optimization  P(xo)  is  feasible,  then  all 
the  future  optimizations  will  be  feasible.  ■ 


B.  MILP  Implementation  of  DRSBK  Online  Optimization 

This  Appendix  shows  the  detailed  MILP  implementation  of 
DRSBK  algorithm.  The  disturbance  is  assumed  to  be  infinity- 
norm  bounded  here,  i.e.,  Wp  -  {Gw  |  HmjHoo  <  wmax}- 
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1)  Constraint  Tightening  for  Robustness:  The  constraint 
tightening  in  (17)  and  (30a)-(30c)  are  implemented  using  the 
following  constraint  contraction  parameters  [18]: 

a0  =  0,  a,  =  otj- i  +  ||[1000]L^_1BpG'||i  wmax 
Pa  =  0,  Pi  =  0^  +  C  || [00 1 0\I^_1BpG\\l  u>max 
7o  =  0,  t i  =  7,-1  +  C  ||[10]/Tp_1Lp_1BpG||1wmax 

j  >  1  (40) 


One  advantage  of  MILP  is  that  the  optimization  can  consider  the 
entire  range  of  vehicle  dynamics  allowed  by  these  constraints. 

3)  Invariance  Constraints  ( 14):  From  the  terminal  states,  the 
vehicle  has  an  option  to  enter  a  left  or  right  loiter  circle.  The 
centers  of  the  left  and  right  safety  circles  are 

°l  =  <+N  |  *  +  R  (  2  )  rw  -  '  fc  ^ 

°Pn=rlUN\k+n(-  2)  w <44b) 


where  aij,/3j,  and  7,  respectively  represents  the  constraint  con¬ 
traction  for  position,  velocity,  and  input  for  the  jth  prediction 
step.  The  coefficient  C  —  1  when  the  constraint  set  (37)  and 
(38)  and  the  disturbance  set  are  both  two-norm  bounded.  How¬ 
ever,  C  —  \/2  when  performing  the  Pontryagin  difference  be¬ 
tween  a  two-norm  bounded  set  and  the  infinite-norm  bounded 
disturbance  set  W  in  (35).  This  is  because  W  has  the  maximum 
magnitude  of  the  length  s/2wm.M  in  the  diagonal  directions. 

2)  Output  Constraint  Set  (II):  The  obstacle  avoid¬ 
ance  constraints  use  binary  variables.  For  each  point 
rl+J  |  k  =  K +j  |  k’Vk+j  |  kf  and  each  rectangular  shaped  ob¬ 
stacle  defined  by  two  comers  [xiow,  j/iow]7  and  [Thigh,  2/high]7, 
the  avoidance  constraints  can  be  expressed  as 


Vo,  Vj  : 

xp  ,  <  x\„w  —  a-i  +  Mtfh  > 

K~tJ  j  fc  —  "  '  '0 

(41a) 

2/fc+j  |  fc  -  2/low, „  ~  +  ^Kbst,Jo2 

(41b) 

■rfc+j  |  fc  —  Xhigh,0  +  aj  —  •^’obst,J.o3 

(41c) 

2/fc+,  |  fc  —  2/high, „  +  ay  —  MbPhat^ 

(4  Id) 

-;obst,Jol  <  3 

1=1 

(41e) 

where  M  is  a 

large  number  to  relax  the  constraints  in 

(4 1  a)— (4 Id),  and  o  denotes  the  index  of  the  obstacle.  The 
logical  constraint  (41e)  requires  at  least  one  constraint  in 
(41a)-(41d)  be  active.  Note  that  the  parameter  aj  tightens  the 
constraints  by  enlarging  the  obstacles. 

The  output  constraint  (11)  also  includes  the  bound  on  speed 
and  inputs.  Let  vectors  r,w,.  and  a,  respectively  represent  posi¬ 
tion,  velocity,  and  acceleration  input  in  the  inertia  frame.  A  set  of 
nd  linear  constraints  approximates  the  two-norm  bounded  con¬ 
straints  on  the  acceleration  and  velocity  vectors,  which  in  turn 
limits  the  maximum  turning  rate 


where  R(0)  is  a  rotation  matrix  of  angle  9,  and  p  is  the  radius 
of  the  turning  circle  given  by 

(t'max  Pn  —  l)  Umax  pN  —  1 

P  =  - - -  x  - ~7~n - • 

®max  iN  —  l  ^min  *  HN  —  1 

The  second  term  accounts  for  the  variability  of  the  terminal 
speed  ||t^+Ar|i,||.  The  binary  variable  chooses  either  the 
left  or  right  safety  circle 

Oi  -2(p  +  a„_i)(l-&feft)  <<F  (45a) 

OvL  +  2 (p  +  ttjv-i)  (1  -  6[eft)  >  Op  (45b) 
OpR  -2(P  +  aN-1)bpeft<Op  (45c) 

Op  +  2(p  +  «N_1)6feft  >Op.  (45d) 

With  the  notation  Op  =  [•'ri;enter,r/penter]T,  the  obstacle  avoid¬ 
ance  constraints  of  the  safety  circle  are  written  as 

Vo  :  Center  <  ^low,.  ~  {P  +  «JV-  l)  +  M^jrc-obst,., 

(46a) 

Vccntet  <  2/low, „  ~  (p  +  «/V-l)  +  Af6f irc_ohst,„2 

(46b) 

Center  >  ®high,„  +  (p  +  «JV-l)  -  M l%itc_ohsi,o3 

(46c) 

2/center  >  2/highJO  +  (p  +  £*JV-l)  -  M&£ir(._obgt  ^ 

(46d) 

E^c-obst,„,<3.  (46e) 

t=i 

4)  Interconnected  Constraints  (13):  Over  the  planning 
horizon,  the  coupling  constraints  include  vehicle  avoidance 
constraints 


[cos  em ,  sin  9rn]vpk+j  ^  k  <  ?;max  -  ftj  (42a) 

[cos  0m ,  sin  I  k  C  rxmax  7 j  (42b) 

0m  =  ^ f-,  Vm  =  1 . nd. 

The  minimum  speed  constraint  is  nonconvex  and  requires  nv 
binary  variables  to  express  in  MILP 

[cos  0m ,  sin  0m  ]  > 

^min  4*  ftj  ~  2v  max  Ci,„.  (43a) 

£  <  »<■  -  1- 
m=l 


Xk+j  |  fc  —  ,rfc+j  |  k  Ctal+^Ch,, 


2/fc 


fc+j  I  fc  —  2/fc+j 

>  x\ 


Cal+^C, 


+  d™ 

I-  '  Utnl 


total 


veh.j  > 


J'fc+j|fc  -  fc+7|  fc 


Ec . s3 


i—  I 


(47a) 

(47b) 

(47c) 

(47d) 

(47e) 


dpq  - 

u,total 


(43b) 


where 


2d  +  2otj ,  q  <  p 

2d  +  otj  +  Oj+i.  q  >  P- 
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Beyond  the  planning  horizon,  constraints  on  the  safety  circles 
ensure  the  vehicle  avoidance 


•^center  —  ^center  2(p+d  +  aN_1)  +  MbpJtCi  (48a) 

2/center  <  Center  ~  2(p  +  d  +  «N_l)  +  Mlf^  (48b) 

■^center  —  ^center  +  2  (p  +  d  +  aN-i)-Mt%tCi3  (48c) 

2/center  >  Center  +  2(p  +  d  +  aN^)  -  (48d) 

ECc„<3.  (48e) 

1=1 

5)  Objective  Function  (20):  The  objective  function  uses  a 
binary  variable  bP-ls  to  select  one  visible  point  r£is  from  a  list  of 
cost  points,  from  which  the  cost-to-go  is  known.  LetrcPi  denote 
the  7th  cost  point  and  i  =  1, . . . ,  ncp,  where  ncp  is  a  number  of 
cost  points.  Then 


=  <49a) 

ncp 

E^s„  =  1  (49b> 

ncp 

CP  /_P  t  _  V"'  UP  ip  ( -V  \  /4  0e\ 

/  V*  Vis/  -  /  /vis.,../  \'cp,i) 

t=l 

Jp  >  [cos  Bm,  sin  0rn]  (^rpk+N  { k  +  /p  «is) ,  Vm 

(49d) 


where  the  cost-to-go  /p(rpp  )  from  each  cost  point  to  the  target 
of  vehicle  p  is  calculated  prior  to  M1LP  and  is  constant  in  MILP. 
To  ensure  the  visibility  of  the  selected  cost  point  r£is  from  the 
terminal  point  x£+/v  |  k,  obstacle  avoidance  constraints  are  en¬ 
forced  on  7iint  interpolation  points  that  are  placed  on  the  line 
connecting  <1S  and  a£+w  ]  k 


tJ’lXk+N\k  +  “  wKis  <  *iow,„  -  a/v-i  + 

(50a) 

Fiy'U.x  I  h.  +  C1  -  Fihvi s  <  ?/iovv,o  -  tt/v-r  + 

(50b) 

F‘xI+n  |  k  +  (!  '  wK is  >  xhigh,„  +  «/v-i  - 

(50c) 

Fivt+N  I  k,  +  (!  -  wKis  >  2/high, „  +  «A'  -1  -  Af6fnti(o4 

(50d) 

EC„<3  (50e) 

1  =  1 

//■I  —  t7"  ■  2  —  1,  .  .  .  ,  ft-int- 

"  int 


In  summary,  the  MILP  implementation  of  subproblem 
Pp(x£)  is  to  minimize  ,/p  in  (49d)  subject  to  (9), 
(10),  and  (4 1  a)— (50e).  The  optimization  variables  are 
»i'+J  |  V;+J ,  <+J ,  fc,  0£.  Op  ,  Op.  <,s,  and  all  binary 

variables  (^’s. 
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This  paper  presents  a  formulation  for  distributed  model  predictive  control  (DMPC)  of  systems 
with  coupled  constraints.  The  approach  divides  the  single  large  planning  optimization  into 
smaller  sub-problems,  each  planning  only  for  the  controls  of  a  particular  subsystem. 
Relevant  plan  data  is  communicated  between  sub-problems  to  ensure  that  all  decisions  satisfy 
the  coupled  constraints.  The  new  algorithm  guarantees  that  all  optimizations  remain  feasible, 
that  the  coupled  constraints  will  be  satisfied,  and  that  each  subsystem  will  converge  to  its 
target,  despite  the  action  of  unknown  but  bounded  disturbances.  Simulation  results  are 
presented  showing  that  the  new  algorithm  offers  significant  reductions  in  computation  time 
for  only  a  small  degradation  in  performance  in  comparison  with  centralized  MPC. 


1.  Introduction 

This  paper  presents  a  distributed  form  of  model 
predictive  control  (MPC)  for  problems  involving 
multiple  subsystems,  each  with  independent  dynamics, 
but  subjected  to  coupled  constraints.  Uncertainty  is 
considered  in  the  form  of  persistent,  independent, 
affine  disturbances  acting  upon  each  subsystem.  For 
each  subsystem,  a  local  planning  optimization  is  solved 
once  per  timestep,  without  iteration,  giving  a  more 
scalable  computation  than  a  single  large  optimization 
for  the  whole  system.  The  subsystems  communicate 
with  each  other,  enabling  the  coupling  between  them 
to  be  enforced  in  each  optimization.  The  algorithm 
guarantees  robust  satisfaction  of  the  coupled 
constraints,  feasibility  of  all  optimizations  involved, 
and  convergence  to  specified  targets. 

Centralized  MPC,  using  numerical  optimization 
for  online  replanning,  has  been  widely  developed 
and  applied  for  constrained  systems  (Qin  and 
Badgwell  1997,  Maciejowski  2002),  with  many  results 
concerning  stability  (Bemporad  and  Morari  1999, 
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Mayne  et  al.  2000)  and  robustness  (Scokaert  and 
Mayne  1998,  Kerrigan  and  Maciejowski  2001, 
Richards  and  How  2006).  However,  the  computational 
effort  required  for  the  optimization  scales  poorly  with 
the  size  of  the  system  and  can  become  prohibitive  for 
very  large  systems.  To  address  this  computational 
issue,  attention  has  recently  focused  on  distributed 
MPC  (DMPC)  (Camponogara  et  al.  2002),  breaking 
the  optimization  into  smaller  sub-problems,  with  the 
rationale  that  solving  many  small  problems  is  faster 
and  more  scalable  than  solving  one  large  problem. 
Furthermore,  in  applications  such  as  the  control  of 
multiple  vehicles  (Pachter  and  Chandler  1998),  large 
chemical  plants  (Venkat  et  al.  2004)  or  communication 
networks  (Yan  and  Bitmead  2005),  there  is  a  spatial 
separation  of  control  agents,  and  it  is  therefore  natural 
to  distribute  the  control  computation.  The  challenge 
of  this  approach  is  to  ensure  that  the  distributed 
decision  making  leads  to  actions  that  are  consistent 
with  the  actions  of  others  and  satisfy  the 
coupling  constraints.  Various  approaches  have  been 
investigated,  including  treating  the  influence  of 
other  subsystems  as  an  unknown  disturbance  (Jia  and 
Krogh  2002,  Magni  and  Scatolini  2006),  coupling  pen¬ 
alty  functions  (Shim  et  al.  2003,  Waslander  et  al.  2003, 
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Dunbar  and  Murray  2006),  partial  grouping  of  compu¬ 
tations  (Keviczky  et  al.  2006),  loitering  options  for 
safety  guarantees  (Schouwenaars  et  al.  2004)  and 
dynamic  programming  (Flint  et  al.  2002).  Some 
approaches  involve  iterative  negotiations  between 
subsystems  (Waslander  et  al.  2003,  Venkat  et  al.  2004) 
and  apply  game  theory  or  duality  (Raffard  et  al.  2004) 
to  study  convergence.  Ling  et  al.  (2005)  propose 
a  sequential  solution  scheme  with  similar  features  to 
the  new  method  in  this  paper,  but  applied  to  a  system 
with  coupled  dynamics  and  no  uncertainty.  Distributed 
MPC  is  also  related  to  distributed  optimization  (Singh 
and  Titli  1978),  with  the  distinction  that  the  latter  is 
primarily  concerned  with  finding  optimal  or 
near-optimal  solutions,  whereas  the  former 
commonly  sacrifices  optimality  in  favour  of  fast  compu¬ 
tation.  Distribution  is  further  complicated  when  distur¬ 
bances  act  on  the  subsystems,  making  the  prediction  of 
future  behaviour  uncertain.  The  new  DMPC  algorithm 
in  this  paper  is  unique,  to  the  authors’  knowledge,  in 
addressing  the  issue  of  feasibility  and  convergence  for 
systems  with  coupling  constraints  and  persistent 
disturbances. 

The  key  features  of  the  new  DMPC  algorithm  are  that 
each  subsystem  (e.g.,  a  vehicle)  only  solves  a  sub-pro¬ 
blem  for  its  own  plan,  and  each  of  these  sub-problems 
is  solved  only  once  per  time  step,  without  iteration. 
The  method  employs  at  each  time  step  a  sequential 
solution  procedure,  outlined  in  figure  1(a).  Under  the 
assumption  of  bounded  disturbances,  each  sub-problem 
is  guaranteed  to  be  feasible,  thus  ensuring  robust 
constraint  satisfaction  across  the  group,  and  its  cost  is 
proven  to  decrease,  implying  that  the  system  converges 
to  the  target.  The  plan  data  relevant  to  the  coupled 
constraints  is  then  communicated  among  the  subsys¬ 
tems.  Figure  1(b)  shows  the  information  requirements 
for  sub-problem  p.  Each  sub-problem  accommodates 
(i)  the  latest  plans  of  those  subsystems  earlier  in  the 
sequence  and  (ii)  predicted  plans  of  those  later  in  the 
sequence.  The  principle  of  constraint  tightening,  used 
for  centralized  MPC  in  Chisci  et  al.  (2001)  and 
Richards  and  How  (2006)  to  accommodate  uncertainty 
in  future  behaviour,  is  applied  in  the  new  algorithm  to 
account  for  uncertainty  in  the  actions  of  other  subsys¬ 
tems.  A  key  advantage  of  the  constraint  tightening 
approach  is  that  the  complexity  of  the  optimization 
remains  the  same  as  for  the  nominal  problem,  which  is 
important  for  the  scalability  of  the  new  algorithm. 
At  initialization,  in  common  with  most  constrained 
MPC  formulations,  e.g.,  Mayne  et  al.  (2000), 
Chisci  et  al.  (2001)  and  Ling  et  al.  (2005),  it  is  necessary 
to  find  a  feasible  solution  to  the  centralized  problem,  but 
this  need  not  be  optimal. 

By  adopting  a  sequential  updating  strategy,  as 
outlined  in  figure  1(a),  in  which  only  one  subsystem 


optimizes  at  a  time  and  the  results  are  passed  on  to  its 
neighbours,  we  are  able  to  achieve  firm  guarantees  of 
feasibility  and  convergence  for  the  class  of  problems 
considered.  The  improved  scalability  follows  because 
the  time  to  solve  an  optimization  grows  at  least  with 
the  cube  of  the  number  of  inputs  (Rao  et  al.  1998)  or 
worse  in  more  complex  cases  (Richards  and  How 
2004).  Therefore,  if  we  consider  an  example  problem 
of  n  subsystems  each  with  m  inputs,  giving  a  total  of 
mn  inputs  to  the  system,  the  centralized  solution  time 
grows  as  0((mn)3).  However,  the  distributed  solution 
time,  allowing  for  the  sequential  process,  grows  only 
as  0(m3n)  with  a  reduced  dependency  on  n.  Additional 
benefits  accrue  because  the  computation  delay  in  each 
local  loop  is  equal  only  to  the  time  to  solve  the  local 
sub-problem,  not  the  whole  sequence  (Richards 
and  How  2005).  Furthermore,  parallel  computation 
can  be  exploited  in  typical  cases  (Kuwata  et  al.  2006), 
and  we  show-  how  structure  in  the  constraints  makes 
this  possible. 

Section  2  presents  the  problem  statement  for  DMPC 
in  a  general  form.  Section  3  presents  centralized  MPC 
solution  for  this  problem,  used  for  comparison  and 
initialization  later  in  the  paper.  Section  4  develops  the 
DMPC  algorithm  and  §5  proves  its  feasibility  and 
convergence  properties.  Section  6  investigates  the 
communication  requirements  for  DMPC:  while  the 
method  relies  on  the  ability  to  communicate  intentions 
between  subsystems,  it  is  possible  to  make  use  of 
structure  in  the  constraints  such  that  only  relevant 
data  is  exchanged.  Finally,  §7  presents  illustrative 
examples  of  DMPC  in  simulation,  comparing 
performance  and  solution  times  with  centralized  MPC. 


2.  Problem  statement 

Consider  the  problem  of  controlling  Np  subsystems. 
Each  subsystem,  denoted  by  subscript  p  e  (1, . . .,  Np], 
has  linear,  time-invariant,  discretized  dynamics 

\p(k  +  1)  =  A pxp(k)  +  BpUp(k)  +  w  p(k),  (1) 

where  xp(k)  e  fit Nr  is  the  state  vector  of  subsystem  p 
at  time  k ,  u p(k)  e  is  the  control  input  to 
subsystem  p  and  w p(k)  e  is  the  disturbance  acting 
upon  subsystem  p.  Assume  all  subsystems  (A,,,  B^)  are 
controllable  and  the  complete  states  \p  are  available. 
The  disturbances  are  unknown  a  priori  but  lie  in 
known  independent  bounded  sets 

Vk,p  wp(k)eWpCmNr.  (2) 
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Step  0 


Step  1 


Step  2 


(a) 


(b) 


Figure  1 .  Overview  of  distributed  algorithm,  (a)  Procedure  flow;  (b)  Information  flow. 


Each  subsystem  is  subjected  to  the  following  local 
constraints  upon  outputs  y  (k)  e  SR^ 

y p(k)  =  Cp\p(k)  +  Dpup(k)  (3) 

y p(k)  €  yp  C  V*.  (4) 

where  the  matrices  Cp  and  Dp  and  the  sets  yp  for  each  p 
are  all  chosen  by  the  designer. 

The  whole  system  is  subjected  to  a  set  of  N,.  coupling 
constraints  applied  to  outputs  z cp{k)  e  summed 

across  the  subsystems, 

Vc  €  { 1 , . . . ,  Nc)  : 

zcp(k)  =  E  cp\p(k)  +  Fcpup(k)  Vp  e  Np)  (5) 

Np 

^zt,(t)e2fc^Vt,  (6) 

p=  i 

where  the  matrices  Erp  and  Ffp  for  each  c  and  p  and 
the  sets  Zc  for  each  c  are  all  chosen  by  the  designer. 
The  division  of  the  coupling  into  the  Nc  separate 
constraints  will  be  exploited  in  §6  to  identify  communi¬ 
cation  requirements  based  on  structure. 

The  objective  is  for  tracking  outputs  s p(k)  e  for 
each  subsystem  to  converge  to  target  sets 

sp(k)  =  Gpxp(k)  +  Hpup(k)  (7) 

s p(k)  -►  Sp  as  k  -*  oo,  (8) 

where  the  matrices  Gp  and  H,,  and  the  compact  sets  Sp 
for  each  p  are  all  chosen  by  the  designer.  Note  that 


this  represents  a  decoupled  objective:  coupling  between 
subsystems  is  only  via  the  constraints  on  outputs  zcp{k) 
in  this  problem  statement.  This  captures  an  important 
class  of  problems,  including,  for  example,  maneuvering 
a  group  of  vehicles  from  one  waypoint  to  the  next 
while  maintaining  relative  formation  and/or  avoiding 
collisions. 


3.  Review  of  centralized  robust  MPC 

This  section  presents  a  short  review  of  the  centralized 
form  of  the  robust  MPC  problem  for  the  problem 
in  §7.  It  is  identical  to  the  controller  in  Richards  and 
How  (2006),  treating  the  group  of  subsystems  as  a 
single  system  and  solving  a  single  optimization  for  the 
controls.  The  centralized  formulation  is  relevant  here 
for  three  reasons:  first,  because  a  feasible  solution  to 
the  centralized  problem  is  required  to  intialize  the 
distributed  algorithm;  second,  because  demonstrating 
satisfaction  of  the  centralized  constraints  is  an 
intermediate  step  in  proving  robustness  of  the  distribu¬ 
ted  algorithm;  and  third,  because  centralized  MPC  is 
used  in  the  examples  in  §7  for  comparison  with  the 
new  distributed  controller. 

The  online  optimization  approximates  the  complete 
problem  in  §2  by  solving  it  over  a  finite  horizon  of  N 
steps.  A  control-invariant  terminal  set  constraint  is 
applied  to  ensure  stability  (Mayne  et  al.  2000). 
Predictions  are  made  using  the  nominal  system  model, 
i.e.,  (1)  without  the  disturbance  term.  The  output 
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constraints  are  tightened  in  a  monotonic  sequence 
(Richards  and  How  2006)  to  ensure  robust  feasibility, 
retaining  a  margin  based  on  a  particular  candidate 
policy. 

Define  the  centralized  optimization  problem  P(x(k)) 

J*c(x{k))  —  min  Jc(x(k),  LJ(A) ) 

W) 

NP  N- 1 

=  min  E  E  disf(k  +M)>  sp(fi) 

W)  7~t  Hs 

subject  to  V/  e  {0, . . . ,  N  —  1}  Wp  e  {1, . . . ,  Np } 

xp(k\k)  =  xp(k)  (9a) 

xp(k  +j  +  1  \k)  =  A pxp(k  +j\k )  +  B  pup(k  +j\k )  (9b) 

y  p(k  +j\k)  =  c pXp(k  +j\k )  +  T>pup(k  +j\k )  (9c) 

*cp{k  +j\k )  =  +;'|/c) 

+  F«J,up(*+./|*)  Vc  e  {1 . Nc]  (9d) 

s p(k  +j\k )  =  GpXp(A:  +y|fc)  +  H pap(k  +j\k)  (9e) 

x„(k+N\k)eTP  (90 

y^+y|t)eT,(/)  (9g) 

Np 

E M*  +9W  6^(i)  Vc  e  ( 1 , . . . ,  7Vf },  (9h) 

p=  i 

where 

•  the  double  index  notation  (k  +  j\k)  denotes  a 
prediction  for  j  steps  ahead  from  time  k 

•  the  state  x(k)  denotes  the  combined  state  of  all 
subsystems  (xi(k),  ...,xNp(k)} 

•  the  decision  variable  U(/c)  denotes  combined  control 
sequences  for  all  subsystems  {U|(k),  . . .  ,  U/v^fc)) 
where  each  Up(/r)  denotes  a  sequence  (up(/c|/c:)>  •  •  • . 
u p(k  +  N-  1 1  Ar)} 

•  d(sp,Sp)  denotes  a  distance  metric  (Kerrigan  and 
Maciejowski  2004) 

d(sp,Sp)  =  min  ||sp  -s  ||  (10) 

S p£Sr  1 

•  the  local  constraint  sets  yp(j)  are  tightened  at  future 
plan  steps  in  a  monotonic  sequence,  ensuring  the 
existence  of  a  margin  to  allow  for  future  feedback 
action  in  response  to  disturbances  (Chisci  et  al. 
2001,  Richards  and  How  2006).  The  tightening  is 
embodied  in  the  following  recursions 


V/>e{l . Nf}:yp(0)  =  y  (11a) 

yp(j  +  1  )  =  %(/)-  (C,  +  DpKp(j))Lp(j)WP. 

V/e  {0,...,N-  1}  (lib) 

•  the  tightened  target  sets  Sp(j)  are  found  using  the 


following  similar  recursions  (Richards  and  How 
2006) 

Wp  e  {1, . .  .,Np]  :Sp(0)  =  S  (12a) 

Sp(j+  1)  -  SPU)  ~  (G p  +  HpKp(j))Lp(j)Wp, 

Vje{0,...,N-l)  (12b) 

•  the  tightened  coupling  constraint  sets  Zc(j)  are 
found  using  the  following  recursions 

Vce  {l,...,Nc}  :Zc(0)  =  Zc  (13a) 

Zc(j+  1)  =  ZAJ)  ~  [(Ec,  +  Ft.|Ki(/))LI(/)>V1  ©  ■  ■  ■ 

®(EfWp  +  FfWpK^(j))L/^(/)>V,v,,] , 

V/e  {0,...,A- 1}  (13b) 

•  the  operator  “  ~  ”  represents  the  Pontryagin  differ¬ 
ence  (Kolmanovsky  and  Gilbert  1998) 

A~B  =  {a\a  +  beA,  Vb  e  (?)  (14) 

which  has  the  following  useful  property,  to  be  used  in 
§  5  to  prove  the  properties  of  the  DMPC  algorithm 

+  Vb  eB  (15) 

•  the  matrices  Kp(f)j  e  (0,  . . . ,  N  -  1)  in  (11),  (12) 
and  (13)  are  candidate  control  laws  for  each  subsys¬ 
tem,  chosen  by  the  designer.  A  constant  gain  may  be 
used  if  required,  but  Richards  and  How  (2006)  dis¬ 
cuss  the  benefits  of  the  more  general  time-varying 
gain.  The  associated  state  transition  matrices  L p(j) 
are  given  by 

L,(0)  =  I  (16a) 

Lp(j+\)=(Ap  +  BpKp(j))Lp(j), 

Y/  e  (0, ...  A  -  2}  (16b) 

•  the  terminal  constraint  sets  Tp  in  (9f)  are  robustly 
positive  invariant  sets  (Kolmanovsky  and  Gilbert 
1998),  satisfying 

(Ap  +  BPKP(N  -  l))x;, 

+  (A,  +  BPK„(N  -  1))L P(N  -  \)wP  e  Tp, 

Vxp  e  Tp,  Vw,  e  Wp,  V/>  g  { 1, . . . ,  Np)  (17a) 
(Cpxp  +  DpK p(N  -  1))xp  6  yp(N), 

VxpeTp,  Vpe  )l,...,Np]  (17b) 

E(^+^k  P(A-l))xp  eZe(N), 

p= 1 

vc  6  {i,...,;vf}, 

V(xf, . xTNf  e{T,  x.-.xT*,}  (17c) 

(G pxp  +  HpKp(N  -  1  ))xp  6  Sp(N), 

VX/,  e  Tp,  Vp  e  { 1 . Np}.  (17d) 
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Tools  for  determining  suitable  sets  are  available 
(Kerrigan  2005).  Note  that  if  the  candidate 
controllers  Kp(  )  are  chosen  such  that  they  would  drive 
the  undisturbed  system  to  the  origin  in  N  steps 
or  fewer,  then  L^N-IJ^O  and  the  requirement 
of  robust  control  invariance  in  (17a)  is  relaxed 
to  nominal  invariance,  and  simple  forms  such 
as  equilibrium  points  can  be  employed  as  terminal 
constraints  (Richards  and  How  2006).  The 
centralized  optimization  is  employed  in  the  following 
algorithm. 

Algorithm  1:  Centralized  MPC. 

(i)  Set  k  =  0. 

(ii)  Find  a  solution  to  the  centralized  optimization 
problem  P(x(k)).  If  a  solution  cannot  be  found,  stop. 

(iii)  Apply  control  u*(k\k)  to  each  subsystem  p. 

(iv)  Increment  k  and  go  to  Step  (ii). 

Given  initial  feasibility  of  P(x( 0)),  this  controller 
guarantees  feasibility  of  all  optimizations  P(x(k)),  satis¬ 
faction  of  constraints  (4)  and  (6),  and  convergence  to  the 
targets  (8),  according  to  Richards  and  How  (2006). 


sp(k  +j\k)  =  G  pxp(k  +j\k )  +  Hpu p(k  +j\k)  (1 8e) 

xp(k  +  N\k)eTp  (18f) 

yp(k+j\k)eyp(j)  (18g) 

Zcp(k+j\k)+zcp(k+j\k)eZcp(j)  Vce{\,...,Nc)  (18h) 

where 

•  the  communicated  coupling  outputs  Z p(k)  consist  of 
output  sequences  for  each  constraint  {Z\p(k),  . 

Z NcP{k)}  in  which  each  element  Z cp(k)  is  a  sequence 
of  output  values  {zcp(k\k),  . . . ,  zcp(k  +  N  —  \\k)}. 
Each  communicated  output  value  z cp(k  +  j\k) 
has  two  components,  as  shown  in  figure  1(b): 
(i)  the  most  recent  plans  of  those  subsystems  earlier 
than  p  in  the  planning  sequence  and  (ii)  predicted 
plans  for  subsystems  later  in  the  sequence 


Z cp(k  +j\k)  - 

E  z*“i{k +ilk) 

r  n 

+ 

E  w+m-v 

?6|p+l . Afp) 

V/e  {0, . . . ,  TV  —  2}  (19a) 


4.  Distributed  MPC  algorithm 

This  section  describes  a  distributed  algorithm  for  solving 
the  problem  in  §2.  The  centralized  optimization  P  from 
§3  is  divided  into  a  sequence  of  Np  sub-problems, 
shown  in  figure  1,  with  each  sub-problem  solving  for 
the  trajectory  of  only  one  subsystem. 

Each  sub-problem  p  solves  for  future  controls 
of  subsystem  p  only.  The  current  state  xp(k)  of 
subsystem  p  and  the  output  information  for 
other  subsystems  Z p{k)  are  constant  parameters,  obtained 
by  measurement  and  communication,  respectively.  It  is 
this  communication  of  coupling  information  from  other 
subsystems  that  ensures  consistency  across  the  whole 
system.  The  sub-problem  P  Axp(k),Zp(k)\  is  defined  as 
follows: 

r(xp{k),Zp(k))  =  mmJp(xp(k),Zp{k),\}p(k)) 

IV  I 

=  [TT,  E  d(sp(k +llkh  5/>W) 

j—0 

subject  to  Y/  6  (0, . . . ,  N  —  1 } 
xp(k\k)  =  xp(k)  (18a) 

xp(k  +j+  \  \k)  =  Apxp(k  +j\k)  +  Bpup(k  +j\k)  (18b) 

yp(k  +  i\k)  =  Cpxp(k  +j\k)  +  +j\k)  ( 1 8c) 

z  cP{k  +j\k)  =  E  cpxp(k  +j\k) 

+  ¥cpup(k+j\k)  Vc  e  { 1 . Nc]  (18d) 


icp(k  +  N-  1  \k)  =  z*c,\k  +  N  ~  !  !^) 

+  J2  (E„,  +  Ff(/K?(yV  -  1)) 

<?£(?+ 1 . Vp) 

x  x*(k  +  N-  1  |/c  —  1)  ,  (19b) 

where  z *q(k  +  j\k)  denotes  the  outputs  associated  with 
the  optimal  solution  to  sub-problem  q  at  time  k. 
The  predicted  outputs  for  q  >  p  are  constructed 
from  the  remainder  of  the  plan  from  the  previous 
time-step  and  a  single  step  of  the  control 
law  =  Kq(N  -  l)x?,  to  keep  the  subsystem  state  in 
its  terminal  set. 

•  the  modified  coupling  constraint  sets  Zcp(j)  are 
given  by 

'ice{\,...,Nc)-.ZcNf0)  =  Zc  (20a) 

Z^. ,)(/)  =  Zcp(j)  ~  (Ef/)  +  FcpKp(j))LrV)Wp  (20b) 
ZCNr{j  +  1)  —  Zc\(j)  ~  (Ef|  +  Ff |K|(/'))L|(/)W| 

Y/e  {0,...,yV-2}.  (20c) 

Besides  the  change  to  consider  only  one  subsystem, 
the  key  difference  between  the  DMPC 
optimization  above  and  the  centralized  optimization 
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in  §3  is  the  tightening  of  the  constraints  in 
(20).  Additional  margin  is  included  in  the  coupling 
constraints  of  each  sub-problem  to  account  for 
the  uncertainty  in  the  predicted  plans  for  other 
subsystems.  It  is  this  feature  that  ensures  the  feasibility 
of  the  distributed,  sequential  optimization  process. 
The  sub-problems  Pp  are  employed  in  the  following 
algorithm. 


Algorithm  2:  Distributed  MPC. 

(i)  Set  k  =  0.  Find  a  solution  to  the  initial  centr¬ 
alized  problem  P(x( 0)).  If  a  solution  cannot  be 
found,  stop. 

(ii)  Apply  control  u p{k)  =  u*(k\k)  to  each  subsystem  p 

(iii)  Increment  k. 

(iv)  For  each  subsystem  p  in  order  1 ,Np: 

(a)  Construct  the  outputs  Z p(k)  from  other 
subsystems,  defined  by  (19),  via  communication. 

(b)  Solve  sub-problem  Pp(xp(k),  Z p(k)). 

(v)  Go  to  Step  (ii). 

Note  that  Step  (i)  does  not  require  finding  the 
optimal  solution  of  the  centralized  problem, 
although  that  is  one  way  of  finding  the  initial 
plan.  Also,  the  algorithm  presented  above  implies 
communication  between  all  pairs  of  subsystems  at 
every  step,  to  provide  the  information  (19)  required  in 
Step  (iv)a.  This  form  will  be  retained  throughout  the 
next  section  for  simplicity  when  proving  the  properties 
of  the  algorithm.  Then  in  §6,  it  is  shown  that  the  identi¬ 
fication  of  structure  in  the  constraints  leads  to  equiva¬ 
lent  forms  with  much  weaker  communication 
requirements. 

In  Algorithm  2,  observe  that  the  state  xp(k)  of 
subsystem  p  is  not  needed  until  the  solution  of 
sub-problem  Pp(xp(k),Zp(k)),  and  that  the  control 
u p(k)  is  available  as  soon  as  that  sub-problem 
is  solved.  Therefore,  by  judicious  choice  of 
sampling  schemes  (Richards  and  How  2005),  the 
computational  delay  in  each  local  feedback  loop 
is  only  the  time  to  solve  the  local  sub-problem  in 
Step  (iv)b,  not  the  whole  sequence  of  sub-problems 
in  Step  (iv). 


5.  Properties  of  DMPC 

This  section  proves  the  main  results  of  this  paper:  that 
Algorithm  2  guarantees  feasibility  of  all  optimizations 
involved,  satisfaction  of  constraints  (4)  and  (6),  and 
convergence  to  the  targets  (8). 


5.1  Feasibility  and  constraint  satisfaction 

The  development  for  feasibility  and  constraint  satisfac¬ 
tion  involves  several  intermediate  results,  all  proving 
that  the  feasibility  of  each  sub-problem  follows  from 
feasibility  of  its  predecessor  in  Algorithm  2,  as  shown 
in  figure  1(a).  This  sequence  is  proven  using  three  propo¬ 
sitions.  Proposition  1  shows  that  existence  of  a  feasible 
solution  to  the  centralized  problem  at  some  time  step 
implies  feasibility  of  the  first  sub-problem  at  the  sub¬ 
sequent  time  step.  Proposition  2  shows  that  feasibility 
of  any  sub-problem  1  to  ( Np  —  1)  at  any  step  implies  fea¬ 
sibility  of  the  subsequent  sub-problem;  and  Proposition 
3  shows  that  feasibility  of  the  final  sub-problem  Np  is 
equivalent  to  the  existence  of  a  feasible  solution  to  the 
centralized  problem.  The  overall  result  follows  by 
recursion:  the  existence  of  an  initial  feasible  centralized 
solution,  ensured  by  Step  (i)  of  Algorithm  2,  implies 
feasibility  of  all  subsequent  optimizations. 

In  every  case,  feasibility  is  shown  by  constructing  a 
particular  candidate  solution,  using  the  candidate 
policies  K p(j)  to  compensate  for  disturbances. 
The  section  begins  with  a  lemma  showing  that  the 
constraints  of  one  of  the  sub-problems  are  equivalent 
to  those  of  the  centralized  optimization  in  §3,  which 
will  be  useful  in  subsequent  results.  There  follow  the 

thrpp  nmnneit  inne  fnrmino  fhp  overall  rpnircinn  Thf» 
r  t"  “  *  •  *  **■ 

section  ends  with  the  theorem  combining  the  intermedi¬ 
ate  results  to  prove  the  main  result. 

Lemma  1:  ZcNp(j)  =  Zc(j)  V/  6  {0, ... ,  N},  i.e.,  the 

constraint  sets  for  the  summed  coupling  outputs  in  the 
centralized  problem  P,  computed  using  (13),  are  identical 
to  the  constraint  sets  for  the  coupling  outputs  in 
sub-problem  P^p,  computed  using  (20). 

Proof:  Comparison  of  (13a)  and  (20a)  shows  that 
Zcxfi0)  =  Zc(  0).  From  Kolmanovsky  and  Gilbert 
(1998),  Theorem  2.1(v),  it  is  known  that 

(.4~B,)~B2  =  4~(Bi®&). 

Applying  this  result  to  the  recursions  in  (20b)  gives 


Zfif)  =  ZcNp(j)  ~  [(Ec2  +  Ff2K2(/))L2(/)W2  ©  ■  •  ■ 
®(Etw,  +  F  £  ,v,  K  np  (/) )  L  /vp  (/')  W  ] . 

Appending  the  recursion  step  (20c)  then  gives 


ZCNp(j  +  1)  —  ZcNp(j)  ~  [(Ec|  +  Ff|Ki(/'))Li(/)W’i  ©  •  •  • 

©  (E cNp  +  Fr,vp  K Np (/))  L,vp  (J)W,\p  ] 

which  is  identical  to  the  recursion  (13c)  used  to  construct 
sets  Zc(j).  Hence,  since  the  initial  sets  Zcn  ( 0)  and  Zc( 0) 
are  identical  and  the  recursions  used  to  construct 
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subsequent  sets  Z,^p(j)  and  Zfj)  for  j  >  0  have 
been  shown  to  be  equivalent,  then  ZcNp  (j)  =  ZC(J) 
for  all  j.  □ 

Notation:  In  the  remainder  of  this  section,  the 
following  notation  is  employed. 

•  U*(A)  denotes  a  known  solution  to  the  centralized 
problem  P(x(k)). 

•  U*(A)  denotes  a  known  solution  to  the  sub-problem 
Pp(xp(k),  Zp(k)). 

•  u*(-|A)  denotes  the  individual  control  elements  of 
the  known  solution  U*(k). 

•  x*(-|fc),  y*(-|k),  z *cp(-\k)  and  s*(-| k)  denote  the 
individual  state,  local  constraint  output,  coupling 
constraint  output  and  tracking  output  elements, 
respectively,  associated  with  the  known  solution 
U *(k),  together  satisfying  (18b),  (18c)  ( 1 8d)  and 
08e). 

•  U p(k)  denotes  a  candidate  solution  to  the  sub¬ 
problem  Pp(xp(k),  Z p(k)). 

•  Up(-j^)  denotes  the  individual  control  elements 
of  (Jp(k). 

•  Xp(-|^),  y  (-| k),  icp(-\k)  and  sp(-|£)  denote  the  indivi¬ 
dual  state,  local  constraint  output,  coupling  con¬ 
straint  output  and  tracking  output  elements, 
respectively,  associated  with  the  candidate  solution 
Up(k),  together  satisfying  (18b),  (18c)  (18d)  and 
(18e). 

The  following  three  propositions  are  required  to  prove 
feasibility  of  Algorithm  2.  Note  that  each  proposition 
follows  on  from  its  predecessor.  This  sequence  will 
also  be  used  in  a  recursion  in  the  subsequent  Theorem 
to  prove  that  feasibility  of  each  sub-problem  follows 
from  feasibility  of  the  preceding  sub-problem. 

Proposition  1:  If 

(i)  at  any  time  step  k0,  a  set  of  solutions  tJ*(ko)  is 
known ,  satisfying  the  constraints  of  P{x(kf))  in  §  3, 
and 

(ii)  the  controls  up(ko)  —  u*(Ail^o)  are  applied  to  the 
system  at  time  step  k0, 

then  at  the  subsequent  time  step  k0  +  1  the  first 
sub-problem  P\(x\(ko  +  1),  Zi(£o  +  1))  m  §4  is 
feasible  for  all  disturbances  v/\(ko)  e  W |. 

Proof:  Consider  the  following  candidate  solution 
Ui(A'0  +  1)  for  sub-problem  P|(X|(A0  +  1),  Z i(k0  -I- 1)) 

ui(Ao  +  1  +y|Ao  +  1) 

=  u](Ao  +  1  +j\ko)  +  K|(/)L|(/)W](Ao) 

V/e{0 . N-2)  (21a) 

tt|(A'o  +  A/jko  +  1) 

=  K,()V-l)x,(*o  +  JV|*o+l)  (21b) 


This  candidate  solution  is  formed  by  taking  from  the  tail 
of  the  known  feasible  solution  U*(Ao),  adding  one  step 
using  controller  Ki(JV  —  1)  at  the  end,  and  adding 
perturbations  representing  the  rejection  of  the 
disturbance  by  the  candidate  controller  Ki(-)  and  the 
associated  state  transition  matrices  Li  (-)- 
The  initial  state  xi(k0+l)  is  given  by  the  true 
dynamics  X](Ao  +  1)  =  Aixi(ko)  +  B|Ui(/co)  +  wi(Ao). 
Also,  the  dynamics  constraints  of  P(x(k0))  ensure  x*(k0 
+  1|A0)  =  A|Xj(A0)  +  BiU](A0).  Comparing  these  two 
expressions  shows  x\(ko  +  1)  =  xf(Ao  +  1  l^o)  +  wi(Ao)- 
With  this  initial  condition  and  the  candidate  control 
sequence  (21a),  the  dynamics  constraint  (9b)  gives  the 
following  candidate  state  sequence,  also  expressed  as  a 
perturbation  from  the  plan  at  time  k0  and  using  the 
state  transition  matrices  (16) 

x, (k0  +  1  +  Jlk0  +  1)  =  x*(k  o  +y+l  |fc0)  +  Li(/)wi(Ao) 

Vj  e  1 }  (22) 

and  corresponding  local  constraint  output  sequences 

y, (ko  +  1  +j\ko  +  l)  =  y*(ko  +./  +  mo) 

+  (C,  +  D1K,(/))L1(/)w1(Ao) 

V;  g  {0, ....  A  —  2}  (23a) 

y,(fe0  +  N\k0  +  1)  =  (C,  +  D,K,(tV-  1)) 
x  x*(kQ  +  7V|k0) 

+  (C,+D1Ki(A-  1)) 
x  Lt(A-  l)wi(Ao).  (23b) 

Feasibility  of  U*(k0)  implies  yt(^o  +j  +  l|ko) 
e  ^iO'-F  1)  V/G  {0, 1},  from  (9g),  and  (C|  + 
D|Ki(A  —  l))x[(A0  +  N\ko)  e  30(^0*  from  (90  and 
(17b).  Therefore,  since  30(/+  1)  is  related  to  T0(/)  by 
the  Pontryagin  difference  in  (lib)  with  the  known 
property  (15),  the  perturbation  expressions  (23)  imply 
yfk0  +  I  +ylko  +')  e  300')  V/  e  {0, . . . ,  N  —  1},  satisfy¬ 
ing  all  the  local  constraints. 

Next,  consider  the  coupling  constraints  (18h).  Similar 
to  (23),  the  coupling  constraint  outputs  associated  with 
the  candidate  solution  (21a)  are 

Vc  €  {1,. . . ,  Wf}: 

Zfi(Ao  4-  1  +;|ko  +  1)  =  z*,(/co  +./  +  1  l^o) 

+  (Er|  +  FfiKi(/))LiO)wi(A0) 

V/  e  10, . . . ,  N  -  2)  (24a) 

i-c\{ko  +  N\ko  +  1)  =  (Eri  +  FfiKi(A'  —  1)) 
x  x*(A0  +  N\ko) 

+  (Er,  +Fr,K,(A-  1)) 
x  L,(/V-  l)wi(Ao) 


(24b) 
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Combining  these  with  the  communicated  output 
sequences  Zi(fc0  +  1),  constructed  using  (19),  gives 
expressions  for  the  summation  on  the  left  hand  side 
of  (18h) 

zci  (Aro  4  1  +j\ko  4  1)  4  zd(A0  4  1  +j\k0  4  1) 

=  ^2  Kqfto  +  1  +il^o) 

_qe{\...Np) 

+  (Ef|  +Ff]K|(/))Li(/>i(fco) 

Vy  e  {0, . . . ,  N  -  2}  (25a) 

+  H\ko  4  1)  4  zc](&o  +  N\ko  4-  1) 

=  J2  (Eri+Ff|K,(iV-l))x,(*o  +  ^|*b) 

J?e{l  ■A'd 

4(Ef,  +FdK,(iV-  1))L,(JV-  l)w,(A0).  (25b) 


The  above  has  shown  that  the  candidate  sequence 
Ui(&o  +  l)  is  a  feasible  solution  to  P|(xi(A-o  + 1), 
Zi(/co+l)X  hence  this  sub-problem  must  be 
feasible.  □ 

Proposition  2:  If 

(i)  at  any  time  step  k0,  a  set  of  solutions  U*(Ao)  is 
known,  satisfying  the  constraints  of  P(x(A:o)), 

(ii)  the  controls  »p(ki3)  =  u*(A:o|&o)  are  applied  to  the 
system  at  time  step  k0 , 

(iii)  for  some  subsystem  po  6  { 1 , . . . ,  (Np  —  1))  a  solution 
u;o(*o  +1)  is  known,  satisfying  the  constraints  of 
sub-problem  Ppo(\po(k0  4  1),  ZPo(kQ  4  1)), 

then  the  sub-problem  Ppo+l(xpo+l(k0  4  1), ZPa+\{ko  4  1)) 
is  feasible  for  all  disturbances  w/)(l  + 1  (A'o )  e  WPo+|. 

Proof:  Consider  the  following  candidate  solution 
£U+i(4  +  1)  for  P,,0+i(X/>o+i(*o  +  1), ZP()+\(ko  +  1)) 


In  order  to  show  that  these  quantities  satisfy  the 
constraints  (9h),  first  use  the  fact  that  since  U*(A:o) 
satisfies  the  constraints  of  P(x(ko))  then  from  (9h),  (9f) 
and  (17c) 


\  r,*  IK  \  1  I  \ 

/  ,  -r  1  -TJ |/vuy 

qe[\...Np) 


c  ^cyj  ~r 


i 


Vj  e  {0, . . . ,  JV  —  2} 


J2  (Ed +FdK,(iV- l))x*(A:0  +  iV|A:o) 
?e|l  -Np) 


e  ZC{N). 


Then,  using  Zc(j)  —  Zcnp(J)  Vy  from  Lemma  1,  the 
Pontryagin  differences  (20c)  and  the  perturbation 
expressions  (25)  show  zd(Ao  +./+  l|&o  +  1)  +  zd(Ao  + 

y'4  1  |Ar0  4  1)  e  ZdU)  V/e  {0 _ ,7V- 1}.  Hence  the 

coupling  constraints  are  satisfied. 

Finally,  it  remains  to  show  that  Vp(k0+  1)  satisfies 
the  terminal  constraints.  Combining  the  final 
control  step  (21b)  and  the  state  sequence  (22)  in 
the  dynamics  constraint  (18b)  gives  the  predicted 
terminal  state 


“/>o+i(^o  +  1  +./l^o  4  1)  —  Up0+|(^o  +  1  +j\ko) 

+  Kw+ 1  (j)  LPo+i  ( j)v/po+ 1  ( k0 ) 

Vy  €  {0, ....  TV  -  2}  (26a) 

“/>o+i  (^o  +  A7|  Aro  +  1)  =  KPo+i  (TV  —  1) 

x  xPo+l(k0  +  N\k0  +  1).  (26b) 

This  sequence  is  constructed  in  the  same  way  as  the 
candidate  in  (21).  Satisfaction  of  the  local  constraints 
(18g)  and  the  terminal  constraints  (18f)  follow  from 
identical  arguments  to  those  used  in  Proposition  1. 
It  remains  to  show  that  the  coupling  constraints  (1 8h) 
are  satisfied  by  this  candidate.  Substituting  the 
candidate  sequence  Up0+i(Ao  4  1)  into  the  construction 
of  the  communicated  data  (19)  gives 

2c(po  +  l)(&0  +7  +  1 1^0  +  1)  +  Zf(p0+|)(^0  +7  4  1 1^0  +  1) 

—  Zroiko  +7  +  1  |A:o  +  1)  +  zcp0(ko  +j+  1  l&o  +  1) 

+  (E<-Q>o+n  +F<'0’o+i)Ep0+i(/))L/,0+i(/')Wp0+i(A'o), 

Y/  6  {0, . . . ,  TV  —  1}  (27) 

and  we  know  from  feasibility  of  U*o(A0  4  1)  that 


X|(/r0  4  N  +  1|A’0  +  1)  =  (A i  4  B|K|(fV  -  1)) 
x  x*(k0  +  7V|A0) 

+  (A,  4B,K,(7V-  1)) 
x  L,(7V-  l)w,(A-o). 

Feasibility  of  U*(Ao)  implies  X|(A0  4  N\k0)  e  T |  and 
hence  the  invariance  requirement  (17a)  ensures 
X|(A-0  4  N  4  1|Ao  4  1)  e  Ti,  satisfying  the  terminal 
constraint. 


z*p0(ko  +j  4  I  |Ap  4  1 )  4  zcpo(ko  +j  4  1  l^o  4  1) 

6  Zcpo(j)  Vye{0,...,Af-l). 

Therefore,  by  applying  the  property  of  the  Pontryagin 
difference  (15)  to  the  constraint  tightening  recursion 
(20b),  we  find  that 

A-(/>o+  i)(^o  +7  4  1  |A'o  4  1)  4  2f(/)0+i  )(ko  4  y  4  I  I  A'o  4  1) 

^  2c(pa+\) 
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hence  the  coupling  constraints  (18h)  are  satisfied, 
the  candidate  U^+iffco+l)  is  feasible  and  so 
the  sub-problem  Ppo+l(xpo+i(k0  +  l),ZPo+1(fco  +  1))  is 
feasible.  □ 

Proposition  3:  If  at  any  time  step  k0  +  1  there  exists  a 
solution  U*(/c0  +  1)  for  every  p  6  {1, . . . ,  Np),  each 
satisfying  the  constraints  of  corresponding  sub-problem 
Pp(xp(ko  -fc  l),Zp(ko  +  1))  then  the  combination  of  those 
solutions  U(fc0  +  1)  =  (U*(/c0  +  1), . . . ,  U^(fc0  +  1)1  also 
satisfies  the  constraints  of  the  centralized  problem 
P(x(k0+\)). 

Proof:  Satisfaction  of  the  local  constraints  (9g) 
and  terminal  constraints  (9f)  of  the  centralized 
problem  follow  from  satisfaction  of  the  corresponding 
constraints  (18g)  and  (180  of  each  sub-problem 
p  e  {1, . . . ,  Np).  It  remains  to  consider  the  coupling 
constraints  (9h).  Substituting  into  (18h)  with  p  =  Np 
for  the  communicated  outputs  from  (19)  gives 


Np 

’Y2z*p^°  +2  +  11*0  +  1)  G  -Zdv 

p= i 


which,  Since  £c{f)  —  ^cNp (j)  Y/  from  LCuuua  1,  implies 


that  the  coupling  constraints  (9h)  of  the  centralized 
problem  are  satisfied  if  the  coupling  constraints  (18h) 
of  the  final  sub-problem  PNp(xNi>(k0  +  l),ZNp(k0  +  1)) 
are  satisfied.  Therefore  U(/c0  + 1 )  satisfies  all  the 
constraints  of  the  centralized  problem  P(x(ko  +  1)). 


Theorem  1:  Robust  feasibility  and  constraint 

satisfaction  of  DMPC.  If  the  systems  (1)  are  subjected 
to  disturbances  obeying  (2)  and  controlled  using 
Algorithm  2,  and  if  a  feasible  solution  to  the  initial 
centralized  problem  P(x(0))  can  be  found ,  then  all 
subsequent  sub-problems  Pp(xp(k),  Zp{k  j)  are  feasible 
and  the  outputs  (3)  and  (5)  satisfy  constraints  (4)  and 
(6),  respectively. 


Proof:  Assume  that  at  some  time  step  k0,  a  solution 
is  known  to  the  centralized  problem  P(x{ko)).  Then  by 
Proposition  1,  sub-problem  P\(x\(k0  +  1), Z](k0  +  1)) 
is  feasible.  Using  Proposition  2  recursively,  every 
subsequent  sub-problem  Pp(xp(k0  +  1),  Zp(k0  +  1))  is 
feasible  for  p  e  {2, . . . ,  Np}.  By  Proposition  3,  feasibility 
of  the  final  sub-problem  PNp(xNfikn  +  \).Zpjp(k0  +  I)) 
implies  that  a  solution  is  known  for  the  centralized 
problem  P(x(ko  +  1)).  This  completes  an  outer 
recursion,  showing  that  knowledge  of  a  feasible  solution 
to  P(x(ko))  implies  that  a  feasible  solution  can  be  found 
for  P(x(A'0  +  1)).  Therefore,  knowledge  of  an  initial 
feasible  solution  U(0)  implies  feasibility  of  every 
subsequent  sub-problem  in  the  DMPC  algorithm. 


Constraint  satisfaction  follows  from  Proposition  3, 
which  showed  that  the  combined  solutions  to  all  the 
sub-problems  satisfied  the  centralized  constraints. 
Substituting  y  =  0  into  (9)  and  combining  with  the 
control  law  u,,(A:)  =  u*(fc|Ar)  yields  constraints  that  are 
equivalent  to  those  in  the  problem  statements 
(4)  and  (6).  Therefore  feasibility  implies  constraint 
satisfaction.  □ 


5.2  Convergence  to  target 

The  proof  of  convergence  under  Algorithm  1  builds 
on  the  results  concerning  feasibility  in  §5.1. 
Propositions  1,  2  and  3  showed  that  a  particular 
candidate  solution  was  feasible  for  each  sub-problem. 
To  prove  convergence,  this  solution  is  used  to  provide 
an  upper  bound  on  the  optimal  cost  J*(xp(k),Zp(k)) 
for  each  sub-problem.  This  then  leads  to  a  Lyapunov- 
like  result  showing  proving  the  desired  convergence 
result  (8).  The  section  begins  with  a  lemma  regarding 
the  distance  metric  which  will  be  useful  in  the  proof 
of  robust  convergence. 

Lemma  2:  Property  of  distance  metric.  Let  B  and  C  be 
compact  sets.  For  any  a,  d( a  +  c,  B)  <  d( a,  B  ~  C)  Vc  e  C. 

Proof:  Let  d\  =a(a,B~C).  From  the  definition  (10) 
of  the  metric,  there  exists  a  b*  e  B  ~  C  such  that 
||  a  —  b*  ||  =  d\.  Then  by  the  definition  (14)  of  the 
Pontryagin  difference,  the  vector  b'+ceB  for  all 
c  £  C.  Therefore  d(st  +  c,  B)  =  min/,eS  ||a  +  c  -  b||  < 
|| a  +  c  —  (b*  +  c)||  =  || a  —  b*||  =  d\ .  □ 

Theorem  2:  Robust  convergence  of  DMPC.  If  the 

systems  (1)  are  subjected  to  disturbances  obeying  (2)  and 
controlled  using  Algorithm  2,  and  if  a  feasible  solution 
to  the  initial  centralized  problem  P(x(0))  can  be  found , 
then  the  tracking  outputs  (7)  converge  as  required  by  (8). 

Proof:  From  Theorem  1  and  Propositions  1 
and  2,  we  know  that  if  U*p(k0)  is  a  feasible  (in  this 
case  also  optimal)  solution  to  Pp(x(ko),Zp{koj) 
for  any  k0  >  1  and  the  control  u*(A:o)  =  u*(A:0|^o)  <s 
applied,  then  a  candidate  solution  +  1)  is  feasible 
solution  to  sub-problem  Pp(xp(k0  +  1),  Zp(k0  +  1)) 
for  p  e  {1, . .. ,  Np]  where  the  control  elements  of  the 
candidate  are  given  by 


Up(ko  +  1  +y|Ao  +  I)  =  up(ko  +  1  +./l*o) 

+  Kp(j)Lp(j)vt  p(ko) 

V/  e  {0, ....  N  -  2} 

U/>(Ao  +  MA'o  +  1)  =  K,,(iV  —  Ox^A'o  +  N\ko  +  1). 

(28) 
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Then  from  (18e),  the  predicted  tracking  outputs 
associated  with  that  candidate  solution  are 

sp(k  o  +j  +  1  l&o  +  1)  =  s*p(k  o  +j  +  1 1*0) 

+  (Gp  +  HpKp(j))Lp(j)wp(k0) 

Vj  e  {0,  ...,2V  —  2}  (29a) 

sp(k  o  +  N\ko  +  1)  =  (Gp  +  HpKp(./V  —  1)) 

x  x*(k0  +  N\ko  +  1) 

+  (G,  +  H,Kp(JV-l)) 

x  LP(N-  l)wp(/fe0).  (29b) 

Applying  Lemma  2  to  (29a)  gives 

d(sp(ko  +  1  +j\k0  +  1),  Sp(j)) 

<  d(s*(ko  +j+  1  |ko), 

Sp(j)  ~  (Gp  +  H,K,(/))L p(j)Wp) 

Vj  e  {0, N  -  2},  Vw p(k0)  e  Wp 

and  by  comparison  with  the  target  set  tightening 
(12b)  this  is  equivalent  to 

d(sp(ko  +  1  +j\ko  +  1),  Sp(jj) 

<  d(s*p(k0  +j  +  1  \k0), Sp(j  +  1 ))  Vje{0,...,N-2), 
Vw p(k0)  €  Wp. 

Furthermore,  feasibility  of  U*(£o)  implies  (Gp  + 
HPKP(A  -  l))x*(k0  +  A|fc0)  e  SP(N),  from  (9f) 
and  (17d).  Therefore,  the  Pontryagin  difference 
in  (12b)  and  the  property  (15)  imply 
sp(k0  +  N\k0  +  1)  6  SP(N-  1),  such  that  the  final  step 
incurs  no  cost  because  the  predicted  tracking  output  is 
inside  the  target  set 

d(sp(k0  +  N\k0  +  1),5P(A  -  1))  =  0. 

Combining  these  findings  gives  a  bound  on  the  cost 

dp(^P(ko  +  1)>  Zp(A'o  +  1),  Up(Ao  +  1 )) 

Jp(xp(ko  +  l),Zp(ko  +  I ),  L/)(A'o  +  1)^ 

,v-i 

=  ^  d(sp(ko  +j  +  1  |A'o  +  1),  Sr{f}) 

7=0 

V-2 

<^4;(kn+.;+l|ko),^(/+l)) 

7=0 


and  this  bound  can  be  expressed  in  terms  of  the  optimal 
cost  at  ko 

JP{\p(ko  +  1),  Zp(fco  +  1 ),  Gp(ko  +  1  )j 
<  rp(xp(k0),zp(k0))  -  d{s;(ko\ko),  Sp( 0)). 

Since  XJp(ko  +  1 )  is  a  feasible  solution,  this  provides  a 
bound  on  the  optimal  cost 

j;(x{k0  + 1), zp(kt:  +  1))  <  rp(xp{k0),Zp(k0)) 

-d(S;(k0\ko),sp(0)) 

Since  d(  j  >  0  by  construction  in  (10),  then  Jp(\p(k), 
Z p(k))  must  be  a  decreasing  function.  Furthermore, 
J*p(xp(k),Zp(k))  cannot  become  negative,  by  definition, 
so  it  must  converge  to  a  steady  value.  This  implies 
d(s*(k\k),  Sp(0))  =  d(sp(k),Sp)  -*  0  as  k  -*■  oo  which, 
by  the  definition  of  the  metric  d{  ■ )  and  the  assumption 
that  each  Sp  is  compact,  implies  that  s p(k)  —*  Sp 
as  k  — »•  oo. 

Remark  1:  Note  that  the  results  in  §5.1  did  not  make 
use  of  the  cost  function  or  of  optimality  of  the  solutions. 
Therefore,  for  problems  that  do  not  require  convergence 
and  are  only  concerned  with  constraint  satisfaction,  any 
form  of  cost  function  can  be  employed  and  the  solutions 
found  for  each  sub-problem  need  only  to  be  feasible,  not 
necessarily  optimal.  For  example,  spacecraft  formation 
control  is  sometimes  expressed  as  the  need  for  each 
spacecraft  remain  in  an  “error  box”  relative  to  the  rest 
of  the  formation  (Inalhan  et  al.  2002).  It  would  be 
possible  to  employ  DMPC  minimizing  fuel  consump¬ 
tion,  the  typical  objective  for  spacecraft,  and  still 
guarantee  constraint  satisfaction. 


6.  Communication  requirements 

The  development  so  far  has  assumed  that  problem 
Pp(xp(k),  Zp(k))  includes  coupling  data  z*cq(k+j\k)  for 
all  constraints  c  e  {1, . . . ,  Nc)  and  for  all  other  sub¬ 
systems  q  ^  p,  according  to  (19).  This  implies  communi¬ 
cation  with  all  other  subsystems  at  every  time  step. 
However,  this  section  shows  that  by  identifying 
structure  in  the  coupling,  significant  relaxation  of  that 
requirement  can  be  achieved. 

Define  Vc  as  the  set  of  all  subsystems  affected  by 
constraint  c 


Vc  =  {pe{\ . Np}  |  [Ef,Fc,]#0}.  (30) 
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This  allows  us  to  identify  and  eliminate  identically  zero 
terms  in  the  summation  of  the  coupling  data  (19), 
leading  to  the  following  equivalent  form. 


z  cP(k  +j\k)  = 


E  z  %{k  +j\k) 

LzeTVnU . p-l) 


E  Kq(k+j\k-\) 

^e'Prn{/j+ 1 , . . . ,  Np } 


i cp(k  +  N  -  1  \k)  = 


y/  e  (o, . . . ,  tv  —  n 

E  K,(k  +  N-\\k) 

L?eAn(i . p-l) 


(31a) 


+ 


E  (E^  +  F^Af-l)) 

qeVcn{p+\,...,Np ) 


X  x*(k  +  N-  1 1 A  —  1 ) 


(31b) 


Also  define  Cp  as  the  set  of  constraints  involving 
subsystem  p 

C,  =  {ce{l,...,tff}  |  [Eep  Fcp]  A  0(.  (32) 

Then,  according  to  (18d),  constraints  not  in  Cp  have  no 
effect  on  sub-problem  Pp(xp(k),  Z p(k)),  and  hence  can  be 
omitted,  replacing  (18d)  and  (18h)  with 

z cp(k  +j\k)  =  E cpXp(k  +j\k)  +j\k) 

Vc  e  Cp  (33a) 

ZcP(k  +j\k)  +  zcp(k  +j\k)  e  Zcp(j)  Vc  e  Cp  (33b) 


from  sub-problem  Ppa,  and  all  the  information  needed  to 
solve  sub-problem  PPo+ ]  is  available  at  the  time 
sub-problem  PPo  is  started.  Therefore,  sub-problems 
PPo  and  Ppo+i  can  be  solved  in  parallel.  Consider  a 
typical  example  in  which  subsystem  1  is  the  leader  and 
subsystems  2  and  3  are  followers,  constrained  relative 
to  the  leader  but  not  to  each  other.  Then  sub-problems 
P2  and  P2  can  be  solved  in  parallel,  giving  the  planning 
sequence  { 1 , (2 & 3),  1,(2&3),  . . .}.  This  principle  can 
be  extended  to  enable  any  number  of  decoupled 
subsystems  to  plan  in  parallel  (Kuwata  et  al.  2006). 


7.  Examples 

This  section  presents  simulation  results  demonstrating 
DMPC  and  comparing  its  performance,  in  terms  of 
both  the  objective  value  and  computation  time,  with 
centralized  MPC.  Further  results  can  be  found  in 
Richards  and  How  (2004),  illustrating  the  application 
of  DMPC  to  multi-aircraft  collision  avoidance,  and  in 
Richards  and  How  (2005),  implementing  DMPC  on 
robotic  vehicles. 

The  examples  in  this  section  involve  five  identical 
point  masses  moving  in  1-D  and  required  to  remain 
within  a  specified  distance  of  each  other  as  they  moved 
from  an  initial  offset  to  the  origin.  The  dynamics  of 
each  subsystem  were 


Vpe{  1 . 5}: 


0.5 

1 


A  random  disturbance  of  up  to  0.1  in  magnitude  acted 
upon  each  subsystem 


The  modified  forms  (31)  and  (33)  are  equivalent  to  the 
originals  (19)  and  (18d,  18h),  respectively,  so  the  results 
on  feasibility  and  convergence  still  hold. 

Finally,  inspecting  (31)  and  (33)  shows  that  problem  p 
only  involves  coupling  data  from  other  subsystems 
q  e  Vc  for  all  c  e  Cp.  Define  the  set  of  all  other  sub¬ 
systems  coupled  to  p  as 


dp  €  {1, . . . ,  5) :  Wp  =  {w  e  !K2  |  ML  <  0.1}. 

Local  constraints  restricted  the  position  |.rp,  i|  <  10, 
velocity  \xp^\  £  1  and  control  \up\  <  1.  The  target  was 
to  drive  all  the  masses  close  to  the  origin  |.Yp,  ||  <  0.5 
and  the  initial  condition  had  all  masses  stationary  and 
collocated  with  xp  =  [5  0]r  dp  e  {1, . . . ,  5}.  The  corre¬ 
sponding  constraint  parameters  were  dp  e  j  1 , . . . ,  5 } : 


Qp  = 


U  \\  M- 


(34) 


Therefore,  sub-problem  Pp(xp(k),  Z p(k))  only  requires 
information  to  be  communicated  from  subsystems 
in  Qp. 

Structure  in  the  constraints  also  enables  parallel 
solution  of  sub-problems  in  some  cases.  Suppose 
Po  +  1 1  QP„  for  some  subsystem  po  G  { 1, . . . ,  Np  —  1 }. 
Then  sub-problem  kpa  \  I  does  not  need  any  information 


'1 

0” 

'O' 

C,= 

0 

1 

,  D ,= 

0 

0 

0 

1 

^  =  )y  e  N3  |  l.vil  <  10,  |.v2|<l,  I.V3|<1} 
G,  =  [1  0],  Hp  =  [0], 

Sp  =  {.v  e  ill  |  |j|  <  0.5}. 
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Figure  2.  Simulation  results  from  DMPC  for  example  with  five  point  masses.  Constraints  are  satisfied  throughout  and  states 
converge  to  targets,  (a)  State  space  trajectories;  (b)  Position  time  histories. 


The  distance  metrics  d(-,  •)  were  implemented  using  one 
norms.  There  were 


coupling  constraints,  requiring  that  the  positions  of 
every  pair  of  masses  remain  within  0.6.  Hence  the 
coupling  constraint  sets  were  Zc  =  {z  e  91  |  |z|  <  0.6) 
for  all  c  and,  for  the  constraint  cpq  associated  with  the 
pair  (p ,  q),  the  coupling  output  matrices  were 

1  0]>  E0-?<7  —  — 

Ew  =  [°  0]  dr±p,q,  Fw  =  [0]  dp. 

The  horizon  was  set  to  TV  — 10  and  the  candidate 
controllers  were  a  simple  constant  LQR  design 

dpe  {1 . 5}:  Kp(j)  =  [-0.6667  -1.3333] 

Y/€{0,...,A). 

The  algorithm  and  simulation  were  implemented  in 
Matlab,  with  the  optimization  solved  using  the 
“linprog”  linear  program  solver  within  the  Matlab 
Optimization  Toolbox.  For  simplicity,  all  of  the  sub¬ 
problems  were  solved  on  the  same  computer,  running 
Windows  on  a  3.2 GHz  processor  with  1  GB  of  RAM. 
This  is  possible  because  the  sub-problems  are  solved 
sequentially.  While  this  is  not  strictly  a  distributed 
implementation,  the  results  still  illustrate  the  effect  on 
computation  of  dividing  a  single  optimization  into  a 
sequence  of  sub-problems. 


For  the  first  set  of  simulations,  constant  step  distur¬ 
bances  were  introduced  at  the  second  step 

,, ,  _  0,  k  <  1 
’  ~  vip,  otherwise 

and  each  subsystem  was  subjected  to  a  step  taken  from  a 
different  vertex  of  the  disturbance  set,  apart  from  one 
subsystem,  which  received  no  disturbance 


Figure  2  shows  the  results  from  the  application  of 
DMPC  as  described  in  §4  to  this  system.  The  state 
space  trajectories  in  figure  2(a)  show  that  the  local 
constraints  were  always  satisfied  and  each  subsystem 
position  converged  to  the  target  set.  Figure  2(b)  shows 
the  time  histories  of  the  position  of  each  mass. 
The  shaded  tube  is  constructed  to  enclose  ±0.3  about 
the  median  position.  Therefore,  since  the  positions  all 
remain  within  this  tube,  no  two  masses  were  ever  more 
than  0.6  apart  and  so  the  coupling  constraints  were 
satisfied.  Note  that  in  both  figures,  the  system  can  be 
seen  to  run  right  up  to  the  constraint  boundaries. 
Hence  the  key  advantage  that  MPC  is  explicitly  aware 
of  constraints  is  retained  by  DMPC. 

For  comparison  with  the  working  example  in 
figure  2(b),  figure  3  shows  position  time  histories  for 
two  simulations  using  two  different  controllers,  each  of 
which  included  a  deliberate  error  in  the  implementation 
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Figure  3.  Simulation  position  time  histories  for  comparison  DMPC  examples  with  implementation  errors.  Constraint  violations 
can  be  seen  in  both  cases,  (a)  Coupling  constraints  omitted;  (b)  Disturbance  model  reduced  by  10%. 


of  DMPC.  Figure  3(a)  shows  the  results  using  DMPC 
but  with  the  coupling  constraints  omitted  in  the  optimi¬ 
zations.  This  is  equivalent  to  using  MPC  independently 
for  each  subsystem  subject  only  to  its  local  constraints. 
Since  the  coupling  constraints  were  violated  in  this 
case,  we  infer  that  the  satisfaction  of  the  constraints 
seen  in  figure  2(b)  is  a  direct  result  of  the  communication 
in  the  DMPC  algorithm. 

Figure  3(b)  shows  another  set  of  position  results, 
this  time  using  DMPC  with  coupled  constraints  but 
with  the  disturbance  models  Wp  used  for  constraint 
tightening  reduced  by  10%.  The  disturbances  applied 
in  the  simulation  remained  as  described  above,  and 
therefore  the  requirement  that  w p(k)  e  Wp  from  (2)  is 
not  satisfied.  Since  the  guarantee  of  feasibility  does  not 
hold  in  this  case,  the  controller  was  configured  to 
apply  u p(k)  =  Kp(())xp(k)  if  the  sub-problem  Pp(xp(k)) 
could  not  be  solved.  Feasibility  was  lost  during  the 
simulation,  and  the  figure  shows  that  constraint 
violations  occurred.  Therefore,  since  the  controller  fails 
when  the  disturbance  exceeds  its  assumed  bound  by  a 
relatively  small  amount,  the  success  of  the  results 
shown  in  figure  2  can  be  attributed  to  the  correct 
constraint  tightening. 

To  investigate  the  computation  and  performance 
trades  of  DMPC  in  comparison  with  centralized  MPC, 
the  point  mass  problem  was  solved  with  both  methods 
for  varying  numbers  of  subsystems  between  Np  —  2 
and  Np  =  20.  DMPC  was  implemented  as  described  in 
§4  and  the  disturbances  were  chosen  randomly  at 
each  step  from  within  the  assumed  bounds.  Figure  4 
compares  the  solution  times,  taken  as  the  total  of  all 
the  optimization  times  over  a  20  step  simulation,  for 


MPC  and  DMPC.  The  solution  times  for  DMPC  are 
broken  down  per  sub-problem,  but  the  overall  height 
of  the  bar  is  the  appropriate  measure  as  sub-problems 
must  be  solved  in  sequence.  Note  that  the  vertical 
scales  are  different:  for  Np  =  20  subsystems,  DMPC  is 
over  five  times  faster  than  normal  MPC.  Figure  5 
compares  the  performance  of  the  two  controllers, 
taken  as  the  sum  of  the  performance  objective 
Y^,L\  12kL\  d(s p(k),Sp),  for  the  same  range  of  problems. 
Each  bar  is  broken  down  by  subsystem  in  both  cases. 
The  scales  of  both  plots  are  identical,  and  hence  it  is 
clear  that  there  is  very  little  performance  difference. 
Closer  inspection  reveals  that  DMPC  has  slightly 
poorer  performance  than  MPC.  This  is  to  be  expected, 
as  DMPC  approximates  the  overall  optimization  using 
the  sub-problem  sequence  and  therefore  is  a  more 
constrained  solution  approach.  However,  taking 
figures  4  and  5  together,  DMPC  is  shown  to  offer 
significant  computational  benefits  for  only  a  slight 
penalty  in  performance. 

8.  Conclusion 

A  formulation  for  distributed  model  predictive  control 
(DMPC)  has  been  presented.  It  solves  the  problem  of 
control  of  multiple  subsystems  subjected  to  coupled 
constraints  but  otherwise  independent.  Each  subsystem 
solves  a  sub-problem  involving  only  its  own  state 
predictions.  The  scheme  is  proven  to  guarantee  robust 
constraint  satisfaction  and  convergence  under  the 
assumption  of  bounded  disturbances.  Each  sub-problem 
is  guaranteed  to  be  feasible,  convergence  is  guaranteed. 
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Number  of  subsystems 


Figure  4.  Solution  time  comparisons  of  MPC  and  DMPC 
for  different  numbers  of  subsystems.  Note  different  vertical 
scales. 


DMPC  CMPC 


0  5  tO  15  20  25  0  5  10  15  20  25 


Number  of  subsystems  Number  of  subsystems 

Figure  5.  Performance  comparisons  oT  MPC  and  DMPC  for 
different  numbers  of  subsystems,  broken  down  by  individual 
subsystem. 


and  no  iteration  between  subsystems  is  required.  Results 
for  a  simple  example  have  shown  that  the  computation 
for  DMPC  is  faster  and  more  scalable  than  the 
equivalent  centralized  controller  and  that  there  is  very 
little  performance  degradation  in  switching  from 
centralized  MPC  to  DMPC. 
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Optimization  using  Receding  Horizon  MILP 
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Abstract 


Motivated  by  recent  research  on  cooperative  UAVs,  this  paper  introduces  a  new  decentralized  trajectory  optimiza¬ 
tion  approach  for  systems  with  independent  dynamics  but  coupled  constraints  and  objectives.  The  overall  goal  is  to 
develop  a  decentralized  approach  that  solves  small  subproblems  while  minimizing  a  fleet-level  objective.  In  the  new 
algorithm,  vehicles  solve  their  subproblems  in  sequence,  while  generating  feasible  modifications  to  the  prediction 
of  other  vehicles’  plans.  In  order  to  avoid  reproducing  the  global  optimization,  the  decisions  of  other  vehicles  are 
parameterized  using  a  much  smaller  number  of  variables  than  in  the  centralized  formulation.  This  reduced  number  of 
variables  is  sufficient  to  improve  the  cooperation  between  vehicles  without  significantly  increasing  the  computational 
effort  involved.  The  resulting  algorithm  is  shown  to  be  robustly  feasible  under  the  action  of  unknown  but  bounded 
disturbances.  Furthermore,  the  fleet  objective  function  is  proven  to  monotonically  decrease  while  going  through  the 
vehicles  in  the  fleet  and  over  the  time.  Simulation  results  are  presented  to  compare  the  distributed,  centralized,  and 
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simulations  and  a  hardware  experiment  demonstrate  that  the  proposed  algorithm  can  improve  the  fleet  objective  by 
temporarily  having  one  vehicle  sacrifice  its  individual  objective,  showing  the  cooperative  behavior. 


Index  Terms 

Cooperative  control.  Distributed,  Multi-vehicle  experiments.  Robust  Model  Predictive  Control  (MPC). 


I.  Introduction 

Teams  of  multiple  UAVs  have  many  applications,  such  as  border  patrol,  cooperative  search  and  track,  and  mobile 
sensor  network  [  1  ]— [5],  To  enable  fleet-level  cooperation,  the  overall  control  system  must  properly  capture  the 
complex  interactions  between  the  vehicles  and  tasks.  One  approach  is  to  solve  this  problem  globally,  but  centralized 
algorithms  typically  scale  very  poorly  with  the  fleet  size  because  of  the  computational  effort  involved.  A  natural 
approach  to  decompose  the  centralized  problem  is  to  let  each  vehicle  optimize  its  own  decision  variables.  A  key 
challenge  of  decentralized  control  is  to  ensure  that  the  distributed  decision  making  leads  to  actions  that  satisfy  the 
coupling  constraints  with  other  vehicles  [6],  [7],  so  that  the  fleet  as  a  whole  executes  consistent  plans.  Various 
approaches  have  been  investigated  to  address  this  problem,  including  treating  the  influence  of  other  subsystems  as 
an  unknown  disturbance  [8],  coupling  penalty  functions  [9]— [  11],  partial  grouping  of  computations  [12],  loitering 
options  for  safety  guarantees  [13],  and  dynamic  programming  [14],  [15].  Some  approaches  involve  iterative  ne¬ 
gotiations  between  subsystems  [11],  [16]  and  apply  game  theory  to  study  convergence.  Decentralization  is  further 
complicated  when  disturbances  act  on  the  subsystems,  making  the  prediction  of  the  future  behavior  uncertain. 
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For  the  trajectory  planning  problem,  several  distributed  control  architectures  have  been  proposed  that  design  paths 
locally  from  the  current  vehicle  location.  Much  of  the  current  research  on  the  distributed  path  planning  assumes  that 
each  vehicle  solves  a  local  problem  and  communicates  this  intent  information  to  its  neighbors  [13],  [17]-[21].  These 
decentralized  algorithms  typically  lead  to  a  Nash  equilibrium  [22]  or  a  Pareto  optimal  surface  [17],  [23],  which 
is  not  necessarily  the  globally  optimal  solution,  because  these  so-called  “communication-based  approaches”  [24] 
do  not  use  the  information  about  the  objective  functions  of  the  other  subsystems  and  do  not  consider  the  overall 
performance.  For  example,  the  recently  proposed  trajectory  planning  algorithm  called  Decentralized  Robust  Safe  but 
Knowledgeable  (DRSBK)  generates  a  local  plan  over  a  short  horizon  while  guaranteeing  the  robust  feasibility  of  the 
entire  fleet  under  the  action  of  external  disturbances  [20],  This  algorithm  uses  distributed  robust  Model  Predictive 
Control  [18]  to  predict  and  account  for  other  vehicles’  behavior.  However,  because  each  vehicle  only  optimizes 
for  its  own  control  and  freezes  the  decisions  of  other  vehicles,  the  resulting  trajectories  can  be  locally  optimal  but 
globally  suboptimal  (i.e.,  coordinated  but  non-cooperative )  [24]. 

This  paper  extends  the  DRSBK  algorithm  to  enable  cooperation  amongst  the  vehicles  in  the  fleet.  Each  vehicle 
solves  its  subproblem  in  sequence  to  optimize  its  own  control  input,  as  in  DRSBK,  but  the  subproblems  can 
also  explicitly  perturb  the  decisions  of  other  vehicles  to  improve  the  global  cost.  The  challenge  here  is  to  avoid 
reproducing  the  global  optimization  for  each  vehicle.  The  proposed  approach  uses  a  low-order  parametrization 
of  the  other  vehicles’  decisions  to  reduce  the  solution  space  while  retaining  the  freedom  to  alter  key  aspects  of 
other  vehicles’  plans.  This  effectively  enables  a  negotiation  between  the  vehicles  with  only  a  small  increase  in  the 
computational  complexity  of  the  subproblem  optimizations. 

The  paper  begins  with  the  problem  setup  in  Section  II.  Following  the  centralized  algorithm  in  Section  III  and 
the  decentralized  non-cooperative  algorithm  in  Section  IV,  Section  V  presents  the  cooperative  form  of  robust 
decentralized  trajectory  optimization  algorithm  using  receding  horizon  MILR  Section  VI  proves  that  the  new 
algorithm  retains  the  robust  feasibility  and  that  each  solution  of  the  subproblem  monotonically  decreases  the  global 
cost.  Section  VII  shows  simulation  results  and  compares  the  performance  and  computation  time  of  the  algorithm. 
Finally,  experimental  results  are  presented  in  Section  VIII. 

II.  Problem  Statement 

Notation:  In  this  paper,  the  index  or  superscript  p,  q  denotes  the  vehicle  index,  index  k  denotes  the  current  time 
step,  and  index  j  denotes  the  prediction  step.  There  are  total  of  n  vehicles.  Unless  otherwise  noted,  Vp  implies 
Vp  =  1 , . . . ,  n,  and  \/q  implies  \/q  =  1, . . . ,  n  but  q  ^  p.  The  neighbor  set  A/J  is  a  set  of  vehicles  that  have  coupling 
constraints  with  vehicle  p  at  time  k.  It  also  determines  the  order  that  the  vehicles  solve  their  subproblcms.  pre(p) 
and  next(p)  respectively  denote  the  vehicle  that  solves  the  subproblem  immediately  before  and  after  vehicle  p. 

A  fleet  of  n  vehicles  are  assumed  to  have  independent  dynamics,  which  are  described  by  the  LTI  model 

xk+ i  =  Axk  +  Bul  +  wh  (0 

where  xpk  is  the  state  vector  of  size  nx,  upk  is  the  input  vector  of  size  nu,  and  wk  is  the  disturbance  vector  for 
the  p,h  vehicle.  The  disturbances  xupk  are  unknown  but  lie  in  known  bounded  sets  wk  6  Wp.  The  environment  has 
obstacles  to  be  avoided,  and  the  vehicles  have  flight  envelope  limitations.  The  general  output  sets  yp  capture  these 
local  constraints  of  each  vehicle 

ypk  =  Cxpk  +  Duvk  G  yP'  Vfc-  (2) 

The  coupling  between  vehicles,  such  as  vehicle  avoidance,  inter-vehicle  communication  range,  and  line-of-sight 
between  vehicles  is  captured  by  a  further  set  of  constraints  applied  to  the  sum  of  outputs  from  each  vehicle 

zpk  =  Epxpk,e  Vp,  V/c  (3a) 

X>*ez.  (3b) 

p=  1 

For  pair-wise  collision  avoidance  constraints,  there  are  only  two  matrices  Ep  and  Eq  that  have  nonzero  entries  for 
a  given  row  of  the  output  matrix  E ,  and  the  set  Z  is  non-convex. 

Finally,  the  goal  of  the  trajectory  optimization  is  to  minimize  the  fleet  cost  composed  of  individual  costs,  which 
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could  be  conflicting.  Optimization  is  performed  to  obtain  the  optimal  input  sequence  for  each  vehicle 

min  Jn).  (4) 

■u£,Vp,Vfc 

III.  Centralized  Approach 

The  centralized  approach  directly  solves  the  full  optimization  in  Section  II.  Let  Up  denote  the  control  input 
sequence  of  the  vehicle  p  over  the  next  N  steps,  i.e.,  Up  =  [uq, ....  upN  ,  ]  .  Then,  a  compact  form  of  the 
optimization  can  be  written  as 

min  (5) 

u1,...,un 

subject  to  "Vp  ■  g(Up)  <  0 

h(Up,  U")  <0,  q  e  Mp 

where  g(Up)  represents  the  local  constraints  for  vehicle  p,  and  h(Up,Uq)  represents  the  coupling  constraints 
between  vehicles  p  and  q.  This  approach  produces  the  globally  optimal  solution;  however,  it  scales  poorly  because 
the  optimization  becomes  very  complex  for  large  fleets  for  most  problem  types  (i.e.,  quadratic  programming  and 
mixed-integer  linear  programming)  [25]. 


IV.  Decentralized  Robust  RHC 


This  section  briefly  describes  the  DRSBK  algorithm  presented  in  [20],  which  is  a  decentralized  approach  that 
decomposes  the  centralized  problem  (5)  into  smaller  subproblems.  DRSBK  is  a  receding  horizon  algorithm,  which 
solves  online  optimization  that  develops  the  control  inputs  for  a  finite  horizon  of  N  steps,  and  implements  only  the 
first  step.  The  optimization  is  repeated  as  the  system  evolves.  To  reduce  the  computational  burden,  DRSBK  uses 
a  short  planning  horizon,  and  the  remainder  of  the  path  to  the  target  is  represented  by  a  sophisticated  cost-to-go 
function  that  is  cognizant  of  obstacles  in  the  cluttered  environment.  Invariance  constraints  are  imposed  at  the  terminal 
step  of  the  short  plan  to  ensure  the  safety  of  the  vehicle  against  the  potential  changes  in  the  environment  beyond 
the  planning  horizon.  Each  vehicle  generates  its  control  inputs  by  solving  a  subproblem  in  sequence,  while  freezing 
the  plan  of  other  vehicles.  The  solution  is  then  communicated  to  other  vehicles  in  the  fleet.  The  advantages  of  this 
algorithm  include  robust  constraint  satisfaction  under  the  action  of  disturbance,  much  better  scalability  compared 
to  the  centralized  approach,  and  requiring  only  local  communication  [20]. 

To  simplify  the  presentation,  the  planning  order  is  assumed  to  be  l,...,n.  The  prediction  of  a  value  at  time 
(k  +j)  made  at  time  k  is  denoted  by  subscript  (Otfe+jl*.  Unless  otherwise  noted,  Vj  implies  j  =  0, . . . ,  N  -  1,  and 
implies  V)  =  0, . . . ,  JV  —  2.  At  time  k,  the  p,h  subsystem  generates  its  own  control  inputs  by  solving  the 
following  subproblem 


JP*  =  mPin/P(xfc+A'|fc) 

s-t-  vi :  xl\k  =  xl 

~P  _  AF~P  4-  Rp?/p 

Xk+j  +  l\k  -  n  Xk+j\k  +  k+j\k 

yvk+j\k =  CPxl+j\k  +  DT>uk+j\k e  y1] 

zk+]\k  =  Epxk+j\k 

ZP  +zp  <=  7P 
zk+j\k  +  Zk+j\k 

Tp  F  Op 
Xk+N\k  y k ' 


(6) 

(7) 

(8) 
(9) 

(10) 

(ID 

(12) 


Here,  the  objective  function  for  the  entire  fleet  is  assumed  to  be  the  sum  of  the  individual  costs.  The  vector  zp+j |fc 
is  a  summation  of  the  outputs  from  the  other  vehicles 


Zk+j\k  ~  /I*  Zk+j\k  Zl+j\k-l 

q  €  2V'.P  q  £  . 

<I<P  q>p 


(13) 


and  is  constant  in  this  local  optimization.  The  first  summation  is  from  the  vehicles  that  have  already  planned  at 
time  k.  The  second  summation  is  from  the  vehicles  that  have  not  planned  at  time  k.  so  that  a  prediction  made  at 
the  previous  time  (k  -  1)  is  used. 
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The  prediction  (8)  does  not  include  the  disturbance,  but  the  constraint  sets  yp  differ  from  the  original  set  yp. 
To  save  some  margin  to  account  for  the  uncertainties  in  the  future,  the  set  yp  is  tightened  with  the  prediction  step 


j  to  form  yj.  A  linear  controller  is  used  to  rejects  the  disturbance  [26] 

yp0  =  yp 

yp+ 1  =  yp  ~  (cplp  +  dppp)wp  vr  (i4> 

where  the  operator  ~  denotes  the  Pontryagin  difference  [27],  and  Lp  is  the  state  transition  matrix 

Lp  =  I 

Lj+i  —  ApLp  +  BpPp  Vi".  (15) 

Similar  tightening  is  performed  on  the  coupling  constraint  sets  in  (1 1),  allowing  uncertainty  margin  for  all  subsystems 

Zln  =  Z  (16a) 

2fea  =  z*  ~  EqL]Wq,  Vi,  q  e  q  ±  Po  (16b) 

Z?l J  =  Zf  ~  Ep°Lf\yp°,  Vj-  (16c) 


where  p0  and  pn  represent  the  first  and  the  last  element  of  the  neighbor  set  Nk,  respectively.  The  operation  (16b) 
tightens  the  constraints  from  the  vehicle  q  to  pre(g).  This  tightening  represents  that  the  vehicle  pr e(q),  which 
generates  the  plan  prior  to  vehicle  q,  saving  some  margin  for  the  vehicle  q  so  that  q  can  use  it  to  reject  the 
disturbances  Wq.  The  operation  (16c)  tightens  the  constraints  from  the  prediction  step  j  to  j  +  1.  This  represents 
that  the  optimization  at  time  k  for  vehicle  pn  saves  some  margin  so  that  the  optimization  at  time  (k  +  1)  for  vehicle 
Po  can  use  it  to  also  reject  the  disturbances. 

The  set  Qpk  in  (12)  is  called  a  safety  set,  and  the  terminal  states  of  the  plan  must  end  in  this  set  [28],  By  using 
nilpotent  candidate  controllers  Pp,  which  makes  =  0  in  (15),  the  safety  set  can  be  expressed  as  a  nominal 
control  invariant  admissible  set,  which  has  the  following  invariance  property 

Vxp  €  Q\  =>  3kp(xp)  s.t. 

Apxp  +  Bpkp(xp)  g  0% 

Cpxp  +  DpKp(xp)  e  ypN _J 

Epxp+  Y,  EQxq  e  ZN-U  V(xpV..,xp")  G  {Qg°  X  ...  x  Qp"}. 

<J  6-Vfc 

where  for  the  vehicles  that  have  not  planned  at  time  k,  the  invariant  set  obtained  in  the  previous  time  (k 
used,  i.e., 

Qk  =  fifc-l’  V9e  {next (p),...,p„} 

and  kp(xp)  is  a  nonlinear  control  law  that  the  vehicle  can  use,  once  in  Qpk,  to  stay  in  Qk.  For  fixed-wing  aircraft, 
one  invariance  example  is  a  loiter  circle  where  kp{xp)  generates  a  centripetal  acceleration  perpendicular  to  the 
heading  direction.  For  rotorcraft,  any  point  with  zero  velocity  is  invariant  with  kp(xp)  =  0. 

DRSBK  minimizes  the  terminal  state  penalty  fp{xP+N\k)  as  the  objective  function  in  (6),  but  the  objective 
function  can  readily  include  the  stage  cost  such  as  control  penalty.  For  the  environment  with  no-fly  zones,  a 
sophisticated  cost-to-go  function  is  used  as  the  terminal  penalty  that  is  calculated  in  two  stages  [29],  First,  prior 
to  online  optimization,  shortest  path  algorithm  is  applied  to  a  graph-based  representation  of  the  environment  to 
estimate  the  approximate  cost  /p(rcomer)  to  fly  from  each  obstacle  comer  rconier  to  the  target.  The  pairs  of  comer 
locations  rcorner  and  the  cost  /p(rcomer)  are  stored  as  a  cost  map.  Then,  the  optimization  chooses  the  best  comer 
rps  that  is  visible  from  a  point  Op  in  the  invariant  safety  set  Qk  of  the  vehicle  p,  so  that  the  cost-to-go  function  is 

Jp  =  fp(*l+Nlk)  =  \\Op-rpJ2+fp(rpJ.  (18) 

For  notational  simplicity  in  the  later  sections,  let  Cp(xp.Uk)  denote  the  set  of  constraints  (7)-(12),  (18)  for 
vehicle  p.  The  first  argument  xp  is  the  initial  condition  used  in  the  right  hand  side  of  (7),  and  the  second  argument 
Uk  denotes  the  control  input  in  the  optimization,  i.e.,  Uk  =  [uP|fcT . •up+;v_1|;>.T]T 


(17) 

—  1)  is 
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Fig.  1 .  Node  p’s  neighbor  set  AT  and  active  coupling  neighbor  set  A,  and  different  types  of  coupling  constraints. 


the  solution  at  time  k,  the  first  control  input  u^k 
time  k  +  1,  the  states  of  each  vehicle  xp+1 


The  optimization  (6)-(12),  (18)  is  implemented  using  mixed-integer  linear  programming  (MILP),  which  can 
express  logical  decisions  such  as  the  choice  of  r7’h,  and  non-convex  constraints  such  as  collision  avoidance.  From 

for  each  vehicle  is  applied  to  the  real  system  (1).  At  the  next 
are  measured,  and  the  optimization  is  repeated.  The  main  result  of 
DRSBK  presented  in  [20]  is  the  robust  feasibility  guarantee,  i.e.  feasibility  of  the  optimization  (7)-(12)  at  time  k 
implies  feasibility  at  time  k  +  1  under  the  action  of  disturbance  Vwk  €  W!>  [30], 

The  primary  advantage  of  this  approach  is  that  the  decision  space  of  the  subproblem  is  approximately  n  times 
smaller  than  the  centralized  approach,  with  many  fewer  constraints.  As  a  result,  the  computation  time  is  much 
smaller,  and  the  algorithm  scales  much  better  than  the  centralized  approach.  Furthermore,  all  the  constraints  are 
robustly  satisfied  for  all  vehicles  at  each  time  step.  However,  since  each  vehicle  does  not  account  for  the  objectives 
of  other  vehicles,  the  resulting  solution  is  coordinated  but  non-cooperative.  This  non-cooperative  solution  is  a  Nash 
equilibrium,  where  no  vehicle  can  improve  its  local  cost  by  changing  only  its  own  decision,  and  is  undesirable. 
The  benefits  and  limitations  of  this  approach  are  clearly  illustrated  in  the  examples  in  Section  VII-B. 


V.  Cooperative  DRSBK  Algorithm 

This  section  extends  the  DRSBK  algorithm  [20]  to  enable  explicit  cooperation.  The  DRSBK  algorithm  guarantees 
the  robust  constraint  satisfaction  under  the  action  of  disturbances.  However,  because  the  objective  function  (6)  does 
not  consider  the  effect  from/on  the  other  vehicles  as  in  (4),  the  resulting  solution  could  be  a  Nash  equilibrium, 
where  the  solution  is  locally  optimal  but  globally  suboptimal.  One  intuitive  approach  to  resolve  this  issue  is  to 
include  all  the  decision  variables  of  other  vehicles  in  each  subproblem.  However,  this  will  reproduce  the  global 
optimization  for  each  vehicle  and  is  clearly  not  scalable.  The  proposed  approach  exploits  the  sparse  structure  of 
the  coupling  constraints  of  trajectory  optimization  and  uses  the  low-order  parametrization  of  the  other  vehicles  to 
reduce  the  dimension  of  the  decision  space. 


A.  Algorithm  Oven’iew 

This  subsection  gives  a  brief  overview  of  the  algorithm  to  present  the  main  idea.  Whereas  DRSBK  freezes  all  the 
decisions  of  other  vehicles,  the  cooperative  form  of  DRSBK  (called  CDRSBK)  updates  the  candidate  solutions  of 
other  vehicles  by  designing  a  feasible  perturbation  to  other  vehicles’  decisions.  Each  vehicle  sequentially  solves  its 
own  subproblem  as  well  as  slightly  modified  subproblems  of  other  vehicles.  The  algorithm  iterates  over  the  team 
of  vehicles,  but  the  simulation  results  in  Section  VII-B  show  that  two  iterations  over  the  fleet  typically  produces  a 
performance  comparable  to  the  centralized  approach. 

Let  ,4P  denote  a  set  of  vehicles  that  have  active  coupling  constraints  with  vehicle  p.  Figure  1  shows  an  example 
of  a  graphical  representation  of  a  vehicle  fleet.  Each  node  represents  a  vehicle,  and  the  arc  connecting  two  nodes 
shows  that  there  is  a  coupling  constraint  between  the  two  vehicles.  The  shaded  nodes  in  the  figure  are  the  neighbors 
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of  vehicle  p ,  i.e.,  91 ,  qA  e  Np.  In  general,  not  all  of  the  coupling  constraints  are  active.  In  this  example,  active 
coupling  neighbors  of  vehicle  p  are  marked  with  thick  lines,  and  <71,92  £  Ap  but  <73,94  ^  Ap. 

The  centralized  optimization  (5)  is  broken  into  the  following  local  optimization  for  each  vehicle  p: 


min  J 

UP 

(19) 

q€AP 

subject  to 

g(Up)  <  0 

(20a) 

g(Uq  +  T9£q)  <  0, 

9  £  Ap 

(20b) 

h{Up,Uq)  <  0, 

qeAfp,  qiAp 

(20c) 

h(Up,  U9  +  T9£q)  <  0, 

q&Ap 

(20d) 

h(Ur,U9  +Tq£q)  <  0, 

reAf9,r(£Ap,qeAp 

(20e) 

h(U91  +  Tq'Cl ,  U 92  +  T92£92)  <  0, 

9i.92  e  Ap. 

(20f) 

The  main  difference  from  the  DRSBK  algorithm  is  that  the  decision  variables  include  Up  for  vehicle  p 

itself  and  £ 9 

for  other  neighboring  vehicles  with  active  couplings.  The  decision  for  9  made  in  p's  optimization  is  a  perturbation 
from  the  communicated  solution  U9,  so  that  the  control  input  of  other  vehicle  9  is  written  as 

Uq  =  U9  +  5Uq  =  U9  +  V9. 

The  perturbation  SUq  is  rewritten  as  8Uq  =  Tq£q  using  a  parametrization  matrix  Tq ,  which  is  elaborated  later. 

The  optimization  includes  the  constraints  for  vehicles  p  and  V9  g  Ap.  The  constraints  (20a)-(20b)  are  the  local 
constraints  for  vehicle  p  and  for  its  active  coupling  neighbors.  The  equations  (20c)-(20f)  express  different  types  of 
couplings  shown  in  Figure  1.  Type  I  is  between  vehicle  p  and  its  neighbors  with  no  active  couplings;  Type  II  is 
between  vehicle  p  and  its  neighbors  with  active  couplings;  Type  III  is  between  vehicle  p’s  active  coupling  neighbors 
and  their  neighbors;  Type  IV  is  between  vehicle  p's  two  active  coupling  neighbors.  Note  that  some  constraints  in 
h(Up,  Uq  +  Tq£q)  <  0  could  be  omitted  if  £q  has  no  impact  on  them  because  of  the  row  rank  deficiency  of  Tq, 
which  is  discussed  in  more  detail  in  Section  V-B. 

A  key  advantage  of  this  algorithm  is  that  it  does  not  freeze  the  other  vehicles’  plans,  so  it  can  avoid  the  Nash 
equilibrium  obtained  from  other  algorithms.  The  next  section  gives  a  method  for  computing  the  parametrization 
matrix  Tq  that  reduces  the  computational  complexity  of  each  local  optimization. 

B.  Reduced-order  Decision  Space 

This  subsection  discusses  the  approach  for  reducing  the  decision  space  and  how  to  compute  the  corresponding 
parametrization  matrix  T 9 . 

I)  Active  Coupling  Constraints:  In  a  typical  trajectory  optimization,  not  all  of  the  coupling  constraints  are  active 
in  the  future  plans.  For  example,  collision  avoidance  problems  typically  have  one  or  two  time  steps  when  the  vehicles 
are  close,  as  shown  in  Figure  2(a).  Furthermore,  the  relative  distance,  i.e.,  the  2-norm  of  the  relative  position  vector, 
can  be  expressed  with  a  set  of  linear  constraints,  but  only  one  of  these  can  be  active,  as  shown  in  Figure  2(b). 
The  approach  presented  here  exploits  this  sparse  structure  of  the  active  coupling  constraints  to  reduce  the  decision 
space.  In  particular,  a  perturbation  is  only  generated  when  there  exists  an  active  coupling  constraint  between  the 
vehicles. 

It  is  assumed  that  the  coupling  constraints  take  a  linear  form 

FPUP  +  FqU9  <  gpq  (21) 

which  can  express  convex  constraints  including  I -norm,  2-norm,  and  oo-norm  bounds.  Using  binary  variables,  non- 
convex  constraints  can  be  also  expressed  in  this  form  in  MILP.  The  subproblem  of  vehicle  p  in  DRSBK  fixed  the 
decision  of  other  vehicles  so  that  (21)  is 

FPUP  +  FqUq  <  gpq. 
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(a)  Trajectory  of  two  vehicles.  The  collision  avoidance  con-  (b)  Two-norm  constraints  expressed  with  a 
straint  is  active  only  at  one  time  step.  set  of  linear  constraints. 

Fig.  2.  “Sparsity”  examples  in  the  trajectory  optimization 


One  cooperative  form  is  to  change  the  right  hand  side  in  p’s  optimization 

FPUP  +  FqUq  <  gpq  -  0  (22) 


and  implicitly  expect  the  vehicle  q  will  take  advantage  of  this  change  in  q's  next  optimization.  The  variable  0  only 
needs  to  change  the  rows  corresponding  to  the  active  coupling  constraints,  because  a  change  in  the  active  coupling 
constraints  can  directly  change  the  decision  space  Uq ,  leading  to  the  potential  change  in  the  cost  of  vehicle  q.  The 
following  form  expresses  the  decision  change  of  q  more  explicitly. 

FPUP  4-  Fq(Uq  +  6Uq)  <  gpq  (23) 


Without  loss  of  generality,  the  upper  rows  of  (22)— (23)  can  be  regarded  as  active  and  the  lower  rows  as  inactive. 
By  comparing  (22)  and  (23),  we  have 


Fq  ■ 

L  inactive  . 

Now  the  goal  is  to  change  Active  by  5Uq.  Note  that  because  there  are  only  a  few  coupling  constraints  that  are  active, 
dim( Active)  is  small.  The  key  here  is  that  5Uq  can  be  parameterized  using  a  variable  of  smaller  dimension. 

Let  m  denote  the  row  rank  of  Fqane,  which  is  also  the  number  of  elements  in  Active  that  any  5Uq  can  change 
independently.  Therefore,  a  new  variable  €  Km  could  replace  5Uq,  where  in  the  trajectory  optimization  problems 
the  dimension  m  of  £q  is  significantly  smaller  than  the  dimension  of  SUq.  Let  A  denote  a  matrix  composed  of  the 
m  independent  row  vectors  extracted  from  Fqclne.  Then,  5Uq  is  parameterized  by  F  as 

5Uq  =  AT(AAT)~l  iq  =Tq£,q .  (24) 


The  inverse  in  this  equation  exists  because  the  product  (AAT)  is  of  full  rank  to,  so  the  parametrization  matrix  Tq 
also  exists.  The  dimension  of  F  is  much  smaller  than  that  of  the  original  control  input  variables  SUk,  so  the  row 
size  of  Tq  is  larger  than  the  column  size.  In  the  examples  considered  in  Sections  VII-C  and  VIII,  the  row  size  of 
Tq  is  2-6  times  larger  than  the  column  size,  illustrating  the  sparseness  assumption. 

2)  Terminal  States:  Extra  degrees  of  freedom  for  the  perturbation  to  q's  decision  could  be  obtained  by  parame¬ 
terizing  the  decision  variables  through  the  terminal  states  in  addition  to  the  active  coupling  constraints.  The  main 
motivation  is  that  the  performance  of  DRSBK  algorithm  depends  strongly  on  the  location  of  the  invariant  safety 
set  in  which  the  terminal  states  xk+N\k  he.  Since 


Xk+N\k  =  Xl\k  +  M 


A'-l 


B, 


,AB ,  B] 


LUk+N-l\k. 


=  xl\k+cvl 
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the  perturbation  in  the  control  inputs  are  parameterized  as 


m  =  Ct{CCt)-1Sx1.n \k  =  T*?. 


(25) 


This  parametrization  matrix  Tq  always  exists  when  the  system  is  controllable  and  hence  the  matrix  C  has  full  row 
rank.  The  parametrization  (25)  reduces  the  dimension  of  the  decision  space  from  $U%,  which  is  Nnu,  to  Sxqk+N jfc, 
which  is  nx.  Note  that  for  collision  avoidance  problem,  if  vehicles  p  and  q  are  far  apart  and  do  not  interact  with 
each  other,  the  variable  E,q  that  generates  perturbation  5Uk  has  zero  dimension  in  p’s  optimization  because  the 
coupling  constraints  are  not  active. 

3)  Constrained  Terminal  States :  When  an  invariant  constraint  is  imposed  on  the  terminal  states  of  the  plan 
(e.g.,  the  hovering  constraint  for  rotorcraft),  the  parametrization  matrix  must  be  formed  differently.  This  subsection 
discusses  the  parametrization  when  the  terminal  velocity  cannot  be  perturbed.  Let  Cpos  =  [I,  O]  and  <7vei  =  [O,  I] 
for  a  double  integrator  system,  where 


Srq 

Udjk+N\k 


^rk+N\k 

Cpos 

l+N\k 

Cvel 

C5U* 


In  order  to  keep  the  same  terminal  velocity,  i.e„  <wqk+N\k  =  0,  the  perturbation  5Uk  must  lie  in  the  null  space  of 
CveiC.  Such  61! I  can  be  realized  using  a  new  vector  r) 


5Uqk  =  En 


(26) 


where  E  denotes  an  orthonormal  basis  for  the  null  space  of  CveiC.  With  (26),  the  perturbation  to  the  terminal 
position  is  written  as 

^rk+N\k  =  Cpos  CEt). 

By  making  the  perturbation  Sr9k+N |fc  as  the  decision  variable  ?/  is  parameterized  as 

r)  =  (CPo*CE)t(  (CposCE)(CposCE)rylC- 


Combining  this  with  (26)  gives 

5Uqk  =  E(CposCE)r(  (CposCE)(CposCE)Tyyq  4  Tq£q.  .  (27) 

This  5Uk  changes  the  terminal  position  rqk+N |fc  without  changing  the  terminal  velocity  vqk+N\k.  Note  that  in 
(27)  has  the  same  dimension  as  the  position,  whereas  the  dimension  of  in  (25)  is  nx. 

4)  Fixed  Binary  Variables:  To  account  for  non-convex  constraints  such  as  vehicle/obstacle  avoidance  or  minimum 
speed,  the  overall  optimization  is  implemented  using  MILP.  Binary  variables  in  MILP  encode  logical  constraints 
or  non-convex  constraints  but  are  the  major  source  of  the  computational  complexity  in  the  MILP  solution  process. 
With  the  goal  of  obtaining  small  perturbations  for  other  vehicles’  decision,  the  CDRSBK  algorithm  fixes  binaries 
of  other  vehicles,  while  solving  for  perturbations  of  their  continuous  variables. 

Figure  3  shows  the  effect  of  fixed  binaries  used  to  express  the  obstacle  avoidance  constraints  (detailed  equations 
given  by  (40)  in  Appendix).  In  region  A,  the  admissible  binaries  are  [0,  1,  1,  1]  where  1  means  the  avoidance 
constraint  is  relaxed.  In  region  C,  the  binaries  are  [1,  1,  1,  0],  The  regions  Bi,  B2  have  three  possible  binary 
settings  [0,  1,  1,  1],  [1,  1,  1,  0],  and  [0,  1,  1,  0],  and  the  output  of  the  MILP  solver  could  be  any  of  them.  If  the 
binaries  are  fixed  [0,  1,  1,  0]  for  a  point  in  region  B,  which  is  a  union  of  Bi  and  B2,  then  the  perturbed  point 
must  also  stay  inside  the  region  B  only.  To  enable  larger  perturbations,  CDRSBK  algorithm  performs  a  MILP 
pre-processing  and  sets  [0,  1,  1,  1]  as  the  binaries  of  the  point  in  B!,  which  allows  the  point  to  move  within  A, 
Bi,  and  B2.  Similarly,  the  point  in  B2  uses  the  binary  setting  [1,  1,  1,  0],  As  an  illustration  of  this  binary  fix. 
Figure  3  shows  feasible  and  infeasible  perturbations  from  a  point  •  with  o  and  x  respectively.  The  binaries  of  the 
non-convex  minimum  speed  constraints  are  fixed  in  the  same  way  to  allow  for  maximum  perturbation,  as  shown 
in  Figure  4. 

Note  that  the  problem  statement  includes  many  binary  variables  that  are  effectively  fixed  or  constraints  that  are 
always  satisfied  (e.g.  the  lower  left  side  of  constraints  in  Figure  4).  The  MILP  solver  CPLEX  eliminates  these 
redundant  variables  and  constraints  in  the  pre-solve  step,  and  the  size  of  the  CDRSBK  subproblem  increases  only 
slightly  from  the  DRSBK  subproblem.  In  the  example  shown  later  in  Section  VII-C,  the  number  of  variables  after 
the  CPLEX  pre-solve  increased  by  15%  for  continuous  variables  and  1%  for  binary  variables. 
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Fig.  3.  Feasible  perturbation  of  the  position,  with  fixed  obstacle 
avoidance  binaries 


For  the  fixed-wing  aircraft,  when  the  terminal  safety  constraint  requires  the  plan  to  end  in  a  loiter  circle,  there 
is  one  exception  in  the  binary  fix.  The  location  of  the  safety  circle  has  a  large  effect  on  the  objective  value,  and 
therefore  it  is  beneficial  to  keep  the  binary  variable  that  selects  left  or  right  circle  (bqeft  in  (44)  in  Appendix)  as  a 
decision  variable  in  p’s  optimization. 


C.  Subproblem 

This  subsection  gives  a  brief  explanation  of  the  subproblem  that  each  vehicle  solves.  The  Appendix  gives  a 
detailed  implementation  using  MILP.  Let  Cp(xk,Uk)  denote  a  set  of  constraints  that  is  the  same  as  Cp(xp,Uk) 
except  that  the  tightened  constraint  equations  (9)  and  (11)  are  replaced  by  the  following 

Vi:  vl+j\ktyPj+ J  (28) 

Zk+j\k  +  Zk+j\k  e  ZJ  +  l  (29) 

with  =  yT^_i  and  ZPN  4  ZPN_1,  which  are  consistent  with  the  tightening  equations  (14)  and  (16)  under  the 
nilpotency  assumption  LPN_ x  =  0.  Then,  the  vehicle  p  solves  the  following  optimization  Ppk: 

min  J 

s.t.  (24),  (25)  or  (27), 

F^ixlUl  +  SUl) . Q^ixl'Wr1  +  dl/r1),  Cp(xp.  Up), 

tP+1Kt^+1  +  ^P+1)’  -  •  m)  <0  (30) 

The  function  F(-)  represents  the  constraints  imposed  on  all  vehicles.  In  this  optimization,  vehicle  p  designs  its  own 
control  Uk  as  well  as  perturbation  to  other  vehicles'  control  SUl,. ..  ,SUk~1,  SUk+l , . . .  ,SUk.  The  notation  ()9 
denotes  the  variables  that  are  received  from  other  vehicles  and  are  constant  in  vehicle  p’s  subproblem,  so  that  the 
control  inputs  for  vehicle  q  are 

UZ  =  V$  +  6Ul.  (31) 

As  shown  in  (30),  if  a  vehicle  q  has  already  planned  at  time  k,  i.e.,  q  <  p,  then  the  vehicle  p  uses  Cq(xqk,  Uk  +  5(Jk) 
as  the  constraints  for  q.  Otherwise,  the  constraints  for  q  are  C Uk  +  5Uk),  where  the  initial  states  are  the 
states  predicted  in  the  previous  plan.  The  objective  is  to  minimize  tne  fleet  cost  (4).  After  solving  the  optimization, 
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Algorithm  1  Cooperative  DRSBK 

l.  Find  a  feasible  solution  of  the  DRSBK  optimization  starting  from  the  initial  states  Xq,  communicate  the 
solutions,  form  Uq,. . . ,  Uq,  and  set  k  =  1. 

2:  for  k  —  1  to  k  =  oo  do 

3:  Form  the  candidate  control  from  the  previous  solution  Uk  1 , . . . ,  Uk_1  using  (32). 

4:  repeat 

5:  for  p  =  1, . . .  ,n  do 

6:  Measure  the  current  states  and  form  xpk. 

7:  Solve  the  subproblem  P£  and  obtain  the  solution  Uk* ,  £q*. 

8:  Update  the  control  Uj!  :=  Uqk  :=  U?  +  Tq£q*. 

9:  Send  the  solutions  to  the  next  vehicle. 

10:  end  for 

li:  until  converges  or  reaches  the  iteration  limit 

12.  Apply  the  first  step  of  the  control  inputs  upk  =  u^k  to  the  system  (1),  Vp. 

13:  end  for 


the  solution  (■)*  updates  the  control  as 


0pk  :=  Ul* 

Uq:=Uq+6Uf,  Vq^p 


which  are  sent  to  the  next  vehicle  p  +  1. 

The  full  CDRSBK  algorithm  is  shown  in  Algorithm  1.  In  the  line  7  of  Algorithm  1,  the  vehicle  p  makes  a 
decision  on  itself'  unci  its  neighbors.  As  shown  in  line  3,  when  the  time  step  is  incremented,  the  c2.ndid2.te  decisions 
Uk  for  the  current  time  are  constructed  from  the  decisions  Uk_1  made  at  the  previous  time,  using 


Vp  : 


~,p 

Uk+j\k 


lk+N-l\k 


Yf 


-  KP(Xk+N-l\k-l)' 


(32a) 

(32b) 


This  operation  shifts  the  plan  Uk__1  by  one  time  step  and  appends  the  terminal  step  taken  from  the  invariance 
control  law  np(x)  in  (17).  The  states  xpk+N_1^k_1  are  obtained  through  the  state  equation 


™P  —  \P7~V  i  R  P.-.P 

J'fe+JV  +  l|fc-l  ^  ^k+Nlk-l  ^  u  Uk+N\k-1' 

Note  that  the  local  optimization  (19)  requires  the  knowledge  of  other  vehicles’  cost  function,  but  this  depends  only 
on  the  target  score,  target  location,  vehicle  states,  etc.,  and  is  simple  to  communicate. 


D.  Implementation  with  Non-zero  Computation  Time 

To  run  the  algorithm  in  real-time  on  the  vehicle  hardware,  non-zero  computation  and  communication  times  must 
be  handled.  This  is  done  by  propagating  the  measured  states  when  forming  xpk  in  line  6  of  Algorithm  1.  Figure  5 
shows  the  time  lines  of  the  CDRSBK  algorithm  implementation. 

The  discrete  time  step  k  is  defined  as  the  time  when  the  control  is  executed  in  line  12  of  Algorithm  1.  The 
latest  measurement  is  taken  r  seconds  before  the  discrete  time,  where  r  is  an  upper  bound  on  the  computation 
and  communication  time  to  the  discrete  time.  As  shown  with  the  gray  arrow,  the  vehicle  propagates  the  measured 
states  up  to  the  discrete  execution  time  using  a  model  of  the  vehicle  and  the  low-level  controller.  The  propagated 
states  Xjt|L-  are  then  used  as  the  initial  condition  of  the  optimization  at  time  k.  When  the  last  vehicle  finishes  its 
computation,  it  broadcasts  the  final  plans  to  the  fleet,  and  all  vehicles  implement  the  control  at  the  same  time.  Note 
that  when  the  vehicle  p  is  planning  at  time  k,  the  prediction  for  vehicle  p  -  1  is  based  on  the  latest  states 
and  the  inputs  u^1,  but  the  prediction  for  vehicle  p+  1  is  based  on  the  states  x^1  j  and  the  inputs  that 

are  predicted  at  the  previous  time  k  —  1. 

By  aligning  the  states  and  the  control  inputs  of  all  vehicles  at  the  discrete  time  steps,  this  framework  compiles 
various  sources  of  uncertainties  into  one  and  allows  us  to  treat  them  as  a  single  prediction  error,  which  can  be 
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Fig.  5.  Time  flow  of  CDRSBK  algorithm  with  non-zero  computation  and  communication  time 


calculated  simpiy  as  w\ =  xk,k  -  xk^k_  x .  Future  work  will  extend  this  implementation  to  reduce  the  delay 
associated  with  the  propagation,  as  studied  in  [31]. 

VI.  Algorithm  Properties 

This  section  discusses  the  two  important  properties  of  CDRSBK  algorithm  in  terms  of  constraint  satisfaction  and 
the  performance. 

A.  Robust  Feasibility 

The  following  theorem  guarantees  the  robust  feasibility  of  the  entire  fleet  after  solving  each  subproblem. 
Theorem  1:  If  a  feasible  solution  C/q  , . . . ,  U£  to  the  following  constraint  ((30)  with  k  =  0,  p  =  n)  is  found 

F(c\xlUl0),...Xn(xZ,UZ))  <  0  (33) 

then,  the  system  (1)  controlled  by  Algorithm  1  satisfies  the  constraints  (2)-(3)  under  the  action  of  disturbance 
wk  €  VVP  for  all  vehicles  p  and  all  future  times  k(>  0). 

Proof:  The  proof  builds  on  the  robust  feasibility  property  of  DRSBK  algorithm,  using  the  fact  that  the  CDRSBK 
algorithm  reduces  to  the  DRSBK  algorithm  when  no  perturbations  are  allowed.  In  DRSBK,  it  was  shown  in  [20] 
that  given  a  feasible  solution  at  time  k  -  1,  the  following  candidate  solution 


'^'k+j\k  ~  Uk+j\k-l  +  Pj  +  lUk-l’  ^ 

(34a) 

K+3\k  =  ^l+j\k-l+LVJWl-V  v 

(34b) 

™k+N~\\k  = 

(34c) 

—  APi-P  _l  RPiip 

aA-+IV|Jc  —  k+N—l\k  '  °  uk+N-l\k 

(34d) 

is  feasible  to  vehicle  p's  optimization  at  time  k  for  any  disturbance  realization  wk_r 
(34)  for  its  own  decision  and  using 

Therefore,  in  CDRSBK,  using 

Cr 

> 

o 

II 

Cr 

£ 

(35) 
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for  other  vehicles’  decisions  form  a  feasible  candidate  solution  for  p’s  optimization.  This  shows  the  feasibility  at 
time  k  —  1  implies  the  feasibility  at  time  k.  By  recursion,  if  the  optimization  at  time  k  —  0  is  feasible,  all  future 
optimization  is  feasible,  which  means  the  vehicles  satisfy  all  the  constraints.  ■ 

Remark  1  (Anytime  algorithm):  Because  the  entire  vehicle  fleet  retains  feasibility  after  solving  each  subprob¬ 
lem,  the  algorithm  can  be  stopped  at  any  time  in  the  iteration. 


B.  Monotonic  Decrease  of  the  Fleet  Cost 

Another  important  property  of  CDRSBK  algorithm  is  that  the  fleet  objective  is  monotonically  decreasing  by 
solving  each  subproblem. 

Theorem  2:  The  fleet  objective  value  monotonically  decreases  by  solving  each  subproblem  Pvk  over  the  fleet 
(line  5  in  Algorithm  1)  and  over  the  time  (line  2). 

Proof:  The  proof  is  based  on  showing  that  the  candidate  solution  to  p’s  optimization  yields  an  objective  value 
that  is  no  worse  than  the  objective  value  found  in  the  optimization  of  the  previous  vehicle  p  -  1  (or  n  if  p  =  1). 

In  p’s  subproblem,  the  candidate  solution  (35)  does  not  change  variables  for  vehicles  Vg,  and  hence  yields  the 
same  local  cost  for  other  vehicles  Jq  that  was  found  in  the  optimization  by  p  -  i .  For  the  cost  of  vehicle  p  itself, 
with  the  assumption  of  nilpotency  LPN_  l  =  0,  the  terminal  states  can  be  written  as 

^k+N\k  ~  APXP+N_1\k-1  +  BpKp(xPk+N_1  |fc_1) 

using  (34b)-(34d).  The  individual  objective  value  Jp  depends  only  on  the  point  Op  in  the  invariant,  which  can 
remain  at  the  same  location  using  the  invariant  terminal  controller  kp{-)  (e.g.  loitering  in  a  circle  with  a  constant 
turn  radius,  or  hovering).  This  implies  the  candidate  solution  (34)  produces  the  same  individual  cost  Jp  obtained 
at  the  previous  optimization.  Because  p’s  optimization  can  use  the  candidate  solution  to  achieve  the  same  fleet  cost 
obtained  by  the  previous  vehicle  (p  —  1),  the  fleet  cost  can  only  improve  by  optimization.  ■ 

Note  that  this  does  not  mean  the  individual  cost  decreases  monotonically.  The  simulation  results  in  Section  VII 
and  VIII  demonstrate  a  temporary  increase  of  the  individual  cost  that  leads  to  a  greater  reduction  of  the  fleet  cost. 


A.  Vehicle  Model 


VII,  Simulation  Results 


The  simulation  uses  homogeneous  n  vehicles.  A  point-mass  model  is  used  to  approximate  the  translational 
dynamics  of  UAVs 


r 

v 


p 

k+ 1 
P 

fc+1  . 


=  Ap 


+  Bpap  +  wp 


Ap  = 


h 

At  1 2 

,  Bp  = 

[  (Af  1 

02 

12 

> 

to 

1 _ 

(36a) 

(36b) 


where  rp,  vp,  and  ap  are  the  position,  the  velocity,  and  the  acceleration  vector  respectively.  Matrices  I2  and  02 
express  an  identity  matrix  and  a  zero  matrix  of  size  2  respectively.  The  disturbance  wp  enters  through  the  input 

wpk  e  Wp  =  {w  \  w  =  Bpn,  n  6  R2,  IHI^  <  wmax}.  (37) 

The  local  constraints  include  the  obstacle  avoidance,  the  maximum/minimum  speed,  and  the  maximum  input 
constraints 


r\iO. 

^min  5:  —  ^max 

||  2  —  Gmax 

where  O  C  i2  expresses  the  no-fly  zones,  and  vmin^max^max  are  the  minimum  speed,  maximum  speed,  and 
maximum  acceleration  of  the  vehicle. 
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Fig.  6.  The  evolution  of  plans  over  the  iteration  for  the  simple  two  vehicle  example. 


B.  Iteration  Within  Each  Time  Step 

The  first  set  of  simulation  is  based  on  the  simplified  problem  setup  to  evaluate  the  solution  within  a  single  time 
step.  Only  the  convex  constraints  are  considered,  so  no  binary  variables  are  used.  Furthermore,  zero  disturbance  is 
assumed  and  hence  no  constraints  are  tightened. 

I)  Simulation  Setup:  In  this  simulation,  A l  in  (36)  is  set  to  1.  The  parameters  for  the  constraints  are  i.’mm  = 
0,umax  =  0.35,  amax  =  0.18,  and  ||r£||2  <  1.  The  systems  are  coupled  with  two  neighbors  through  the  following 
position  constraints. 


n  -  r 


,p+l 


<  0.8,  p  =  1, . . . ,  n  —  1 


|K-ri||2<0.8 

These  two-norm  constraints  are  expressed  as  a  combination  of  linear  constraints.  The  cost  direction  for  the  p,h 
vehicle  is 

cP  =  [  C0S(E^1) 

The  overall  cost  function  to  minimize  is  quadratic,  and  has  a  stage  cost  and  the  terminal  penalty 


Y.  Y1  {Xf  RlXl  +  UlT  R  2«fc  }  +  CpT'r\ N  +  rPNT  HrN 

p=  1  k~0 

where  the  weights  on  the  states  Ri  and  inputs  R-i  in  the  stage  cost  are  chosen  to  be  much  smaller  than  the  weight  H 
on  the  terminal  position.  All  the  vehicles  are  initially  at  the  origin.  Both  the  centralized  and  the  local  optimization 
are  written  as  quadratic  programming,  and  CPLEX  9.1  is  used  as  a  solver. 

2)  Simple  Two  Vehicle  Case:  The  first  example  involves  two  vehicles  p  and  q  that  can  move  on  a  two  dimensional 
plane.  The  terminal  position  of  the  vehicle  p  has  its  local  minimum  at  coordinates  (0,  0.7),  i.e., 

=  argmin{c,Tr^  +  rpNT  HrpN  }, 

rN  1  ’ 


0 

0.7 
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Fig.  7.  Final  plans  for  five  vehicles 


and  that  of  the  vehicle  q  is  at  (0,  -0.7).  Because  the  two  vehicles  must  satisfy  the  separation  constraint  of  0.8, 
their  separate  objectives  are  conflicting.  The  planning  horizon  is  three  steps  for  both  vehicles. 

Figure  6  shows  the  evolution  of  plans  over  two  iterations.  The  plans  of  vehicle  p  are  marked  with  o,  and  those  of 
vehicle  q  are  marked  with  x .  Originally,  both  vehicles  are  at  the  origin.  First,  vehicle  p  solves  its  local  optimization. 
Because  no  coupling  constraints  are  active  at  this  point,  the  terminal  position  reaches  the  local  minimum  (0,  0.7). 
Vehicle  q  then  solves  its  optimization,  but  given  the  separation  constraint,  this  vehicle  can  only  plan  to  move  to 
(0,  -0.1),  as  shown  in  the  second  part  of  the  figure. 

The  vehicle  p  solves  the  next  optimization,  but  since  a  coupling  constraint  has  become  active,  it  uses  a  parameter¬ 
ized  decision  for  q  with  a  variable  of  dimension  m—  1.  The  bottom  figure  shows  the  plans  after  two  iterations. 
The  final  plans  are  the  same  as  the  globally  optimal  centralized  solution. 

If  the  decentralized  non-cooperative  algorithm  in  Section  IV  were  used,  it  would  produce  a  Pareto  optimal  solution 
shown  in  the  second  figure  of  Figure  6,  which  is  clearly  not  the  globally  optimal  solution  shown  in  the  bottom.  Note 
that  if  the  vehicle  q  plans  first  followed  by  vehicle  p,  the  non-cooperative  algorithm  results  in  a  symmetric  Pareto 
optimal  solution,  which  again  is  not  the  globally  optimal  solution.  This  example  clearly  shows  the  performance 
improvement  over  the  decentralized  non-cooperative  approach. 
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Fig.  8.  Comparison  of  three  algorithms  in  terms  of  performance  and  computation. 


3)  Five  Vehicle  Case:  Figure  7  shows  a  more  complex  case  with  five  vehicles.  In  this  example,  the  local  minimum 
for  each  vehicle  is  located  on  a  unit  circle  centered  at  the  origin.  The  planning  horizon  is  three  .steps  for  all  vehicles, 
and  the  planning  order  is  1— >3— >5— >2— >4  to  highlight  the  effect  of  the  planning  order  on  the  performance. 

Two  other  algorithms  are  used  as  benchmarks.  These  are:  1)  the  centralized  approach  in  Section  III  that  provides 
the  globally  optimal  solution,  and  2)  the  decentralized  non-cooperative  approach  in  Section  IV  that  produces  a 
locally  optimized  solution. 

As  shown  in  Figure  7(b),  the  decentralized  non-cooperative  approach  produced  a  suboptimal  solution,  because 
the  vehicles  that  plan  earlier  are  less  constrained  and  have  more  region  to  operate  than  the  vehicles  that  plan  later 
in  the  cycle.  The  decentralized  cooperation  algorithm  produced  the  trajectories  shown  in  Figure  7(c)  whose  shape 
are  very  similar  to  the  centralized  solution  shown  in  Figure  7(a). 

4)  Perfonnance  and  Computation:  Figure  8  compares  the  global  objective  value  and  the  cumulative  computation 
time  of  three  algorithms  for  the  five  vehicle  example.  Different  lengths  of  the  planning  horizon  N  =  4,6,8  were 
considered  to  investigate  the  scalability  of  the  algorithms. 

The  solutions  of  the  decentralized  non-cooperative  approach  are  marked  with  *.  Although  the  computation  time 
is  small,  the  cost  is  fairly  high.  The  centralized  (and  hence  globally  optimal)  solutions  are  marked  with  o.  The  lines 
with  x  show  the  evolution  of  the  global  cost  of  the  decentralized  cooperation  algorithm.  The  plot  starts  from  the 
end  of  the  first  iteration  when  every  vehicle  has  its  solution  and  continues  to  the  end  of  the  second  iteration.  This 
proposed  algorithm  has  objective  values  comparable  to  those  of  the  centralized  solution  but  scales  better  than  the 
centralized  solution  when  the  problem  size  increases. 

Figure  9  shows  cases  with  more  vehicles  (n  =  5,  7,  10,  15).  The  decentralized  non-cooperative  approach  has 
much  higher  cost  and  is  out  of  the  range  of  the  plot.  For  the  centralized  and  the  proposed  approach,  the  differences 
in  the  computation  time  scale  up  significantly  for  larger  fleets.  Note  that  in  all  the  plots  of  Figures  8  and  9,  the  lines 
of  the  proposed  approach  are  monotonically  decreasing,  which  validates  the  result  in  Section  VI-B  by  simulation. 
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Fig.  9.  Trade-off  between  the  performance  and  the  computation  time.  From  the  top  to  the  bottom,  the  number  of  vehicles  are  5,  7,  10,  and  15. 


C.  Full  CDRSBK  Algorithm 

This  section  presents  the  performance  comparison  of  DRSBK  and  CDRSBK  through  simulation. 

1)  Setup:  The  simulation  uses  fixed-wing  UAVs,  whose  dynamics  are  (36)  with  At  =  5.  The  disturbance 
magnitude  wmax  is  5%  of  the  control  authority  amax.  The  planning  horizon  length  N  is  4.  The  parameters  for 
dynamic  constraints  are:  vmm  =  18,  vmax  =  24,  amax  =  3.84.  A  two-step  nilpotent  controller  is  used  for  this 
system  to  tighten  the  constraints  in  (14)  and  (16),  and  the  parameters  for  constraint  tightening  are  obtained  through 
(39)  in  the  Appendix 

Qo  =  0,  Po  —  0,  7o  =  0, 

c*i  =  2.4,  Pi  =  1.4,  7i  =  0.54, 

otj  =  4.8,  Pj  =  2.7,  7 j  =  0.81,  j  >  2. 

When  the  coupling  constraints  are  active,  the  parameterization  matrix  Tq  is  calculated  from  (25) 
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Fig.  10.  Comparison  of  trajectories  executed. 
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showing  the  “tall”  parameterization  matrix  that  reduces  the  dimension  of  q’s  decision  space.  Although  the  objective 
function  can  be  an  arbitrary  function  of  all  vehicles’  decisions,  the  goal  of  the  trajectory  optimization  is  assumed 
to  minimize  the  mission  completion  time  of  the  fleet  in  this  example.  The  summation  of  the  individual  cost  is  also 
included  with  a  small  penalty  e,  so  that  the  vehicles  that  complete  the  mission  before  the  last  vehicle  also  minimize 
the  individual  completion  time. 

n 

J  (J°, . . . ,  Jn)  =  max  JP(UP)  +  e  ^  JP(UP)  (38) 

P  p= i 

2)  Results:  The  scenario  considers  two  vehicles  trying  to  reach  their  own  targets  (marked  with  □)  while  avoiding 
obstacles  and  the  other  vehicle.  The  goal  is  to  minimize  the  mission  completion  time  with  a  small  penalty  e  =  0.05 
on  the  individual  cost  in  (38).  Figure  10  shows  the  trajectories  generated  by  DRSBK  and  CDRSBK  algorithms. 
The  trajectory  of  vehicle  1  is  marked  with  x,  and  that  of  vehicle  2  is  marked  with  A.  Because  vehicle  1  has  to 
traverse  a  longer  route,  the  optimal  solution  is  for  vehicle  2  to  move  over,  which  is  the  behavior  achieved  by  the 
CDRSBK  algorithm.  Both  distributed  algorithms  maintained  feasibility  under  the  action  of  disturbances.  However, 
DRSBK  subproblem  solely  minimizes  the  individual  cost  without  considering  the  performance  of  the  other  vehicle, 
making  no  improvement  on  the  cost  for  some  time. 

This  cooperative  behavior  is  also  seen  in  the  plot  for  the  objective  values.  Figure  1 1  shows  the  time  history  of  the 
individual  cost  Jp  and  the  fleet  cost  J.  Both  algorithms  monotonically  decrease  the  fleet  objective.  As  shown  in  the 
right  figure,  the  cooperative  formulation  allows  the  individual  cost  to  increase  if  it  leads  to  a  larger  improvement  of 
the  fleet  cost.  Between  optimization  #14-17  (which  correspond  to  time  7-9),  the  vehicle  with  a  better  cost  (vehicle 
2)  yields  to  the  vehicle  with  a  worse  cost  (vehicle  1),  enabling  a  large  reduction  in  the  fleet  cost  J.  The  average 
computation  time  for  solving  MILP  in  this  scenario  was  0.050  second  for  DRSBK  and  0.064  second  for  CDRSBK. 

VIII.  Hardware  Results 

A  similar  scenario  is  tested  using  an  unique  quadrotor  testbed  developed  at  the  Aerospace  Controls  Laboratory 
of  MIT. 

A.  Quadrotor  Testbed 

This  section  briefly  describes  the  quadrotor  testbed.  More  details  are  available  in  the  recent  article  [32],  As  a 
sensing  device,  the  testbed  uses  a  Vicon  motion  capture  system  [34],  which  are  the  cameras  shown  in  background 
of  Figure  12.  The  Vicon  system  provides  a  position  estimate  of  sub-millimeter  accuracy  at  100  Hz,  and  the  filtered 
time  difference  gives  a  velocity  estimate  of  1  cm/s  peak-to-peak  accuracy.  The  vehicles  are  commercially  available 
Draganflyer  V  Ti  Pro  [35],  and  no  significant  modifications  was  required  to  the  hardware  to  fly  them  autonomously. 
Several  lightweight  reflective  markers  are  attached  to  each  vehicle  in  a  unique  configuration,  which  the  Vicon  system 
uses  to  track  the  position  and  orientation  of  each  vehicle  in  the  room.  The  low-level  controller  is  designed  to  track 
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non-cooperative  cooperative 


Fig.  11.  Time  history  of  the  objective  function 


0 

o 


Fig.  12.  Quadrotor  testbed  using  Vicon  system  [32].  [33] 


waypoints  which  are  provided  in  real-time  by  the  planner.  The  waypoint  follower  was  designed  using  standard  LQR 
techniques,  which  calculates  the  motor  commands  of  each  rotor  off-board  and  sends  them  to  the  quadrotor  using 
an  R/C  transmitter  [32], 

A  planning  laptop  is  assigned  to  each  vehicle,  and  the  inter-vehicle  communication  is  implemented  as  a  com¬ 
munication  over  the  TCP/IP  network. 
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-3-2-10123 
x  [m] 


(a)  Vehicle  1 


-3-2-1  0  1  2  3 

x[m] 

(b)  Vehicle  2 


Fig.  13.  The  plans  and  the  trajectories  of  two  quadrotors  from  the  CDRSBK  algorithm  experiment 


B.  Result 

The  objective  of  this  experiment  is  to  demonstrate  that  the  algorithm  can  generate  online  a  trajectory  using  receding 
horizon  techniques.  Many  disturbances  sources  exist  in  the  hardware  experiments,  such  as  air  flow,  modeling  error 
of  the  vehicle,  sensing  noise,  communication  delay,  and  imperfect  tracking  of  the  low-level  controller.  The  CDRSBK 
algorithm  must  account  for  these  uncertainties  to  robustly  satisfy  the  constraints.  The  following  parameters  were 
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used  in  the  algorithm. 

At  =  2.5  sec,  N  =  6 

7~i  =  1.3  sec,  T2  =  0.7  sec 

Umax  =  0.30  m/s,  amax  =  0.45  m/s2 

wr  —  0.27  m,  wv  —  0.09  m/s 

where  rj  and  r2  are  the  propagation  time  for  vehicle  1  and  2,  as  shown  in  Figure  5.  Here,  the  disturbance  enters 
separately  for  position  and  velocity.  Because  the  quadrotors  can  hover,  the  full  stop  was  used  as  the  terminal  safety 
constraints.  The  procedure  in  [26]  produced  the  following  constraint  contraction  parameters  in  (39). 

a0  =  0,  /?o  =  0,  70  =  0, 

ai  =  0.27,  f3\  =  0.09,  7i  =  0.067, 

a2  =  0.428,  fo  =  0.198,  72  =  0.106, 

a3  =  0.440,  0j  =  0.208,  7,  =  0.110,  j>  3 

The  vehicles  1  and  2  started  around  (1.5,  3.5)  and  (—1.2,  2.5)  respectively.  The  targets  for  vehicle  1  are 
(—1.5,  3.0)  are  (-2.4,  5.0),  and  the  targets  for  vehicle  2  are  (1.2,  2.3)  and  (2.0, 3.5),  which  are  all  marked  with 
IXI.  The  vehicles  must  switch  the  position  while  avoiding  the  other  vehicle  and  the  obstacle  in  the  middle. 

Figure  13  shows  this  scenario  and  the  plot  of  the  trajectory  of  each  vehicle.  The  red  thick  line  is  the  actual 
trajectory  of  the  vehicle,  which  were  recorded  at  2  Hz.  All  the  plans  generated  in  the  receding  horizon  framework 
are  shown  with  blue  lines. 

As  a  result  of  their  initial  locations,  vehicle  2  approaches  the  bottom  of  the  obstacle  before  vehicle  1,  and  vehicle 
2  then  tries  to  go  along  the  bottom  of  the  obstacle,  as  the  planned  trajectories  in  Figure  13(b)  show.  This  would 
have  the  effect  of  delaying  vehicle  1,  which  has  targets  that  are  further  away,  and  already  has  a  longer  mission  to 
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execute  than  vehicle  2.  Thus,  vehicle  2  yields  way  to  vehicle  1  to  minimize  the  fleet  mission  completion  time  in 
(38).  This  cooperative  effect  is  also  shown  in  Figure  14,  which  plots  the  objective  values  of  the  first  15  plans.  Note 
that  between  plans  5  and  9,  there  is  a  temporary  increase  of  the  cost  for  vehicle  2,  but,  even  under  the  action  of 
disturbances,  the  total  objective  value  is  monotonically  decreasing.  The  average  computation  time  for  each  local 
optimization  on  a  2.4  GHz  laptop  was  0.31  second.  This  flight  test  successfully  demonstrated  the  cooperative 
behavior  by  the  distributed  online  planning  algorithm. 

IX.  Conclusions 

This  paper  presented  a  robust  decentralized  trajectory  optimization  algorithm  that  includes  explicit  cooperation. 
Each  vehicle  sequentially  solves  the  subproblem,  but  the  subproblem  also  includes  the  global  objective  and  feasible 
modifications  to  other  vehicles’  plans.  The  overall  optimization  is  written  in  MILP.  In  order  to  maintain  the 
scalability  of  the  algorithm,  continuous  variables  of  neighboring  vehicles  are  parameterized  using  a  variable  of 
smaller  dimension,  while  most  binary  variables  are  fixed.  It  is  shown  to  guarantee  the  robust  feasibility  and  the 
monotonic  decrease  of  the  global  cost.  Simulation  results  showed  that  the  proposed  algorithm  scales  much  better 
than  the  centralized  approach  and  the  performance  is  much  better  than  that  of  the  non-cooperative  approach  with 
a  marginal  increase  in  the  computation.  The  algorithm  was  implemented  on  the  quadrotor  testbed,  and  the  various 
features  of  the  algorithm  have  been  successfully  demonstrated. 

Appendix 

The  following  describes  the  MILP  implementation  of  the  CDRSBK  algorithm.  In  this  section,  the  disturbance  is 
assumed  to  be  infinity-norm  bounded,  i.e.,  Wp  =  {Gw  |  UtoH^  <  u>max} 

A.  Constraint  Tightening  for  Robustness 

The  constraint  tightening  in  114s)  and  flfit  are  imnlemented  hv  using  the  following  constraint  contraction  rtaram- 
eters  [36] 

a0  =  0,  aj  =  cl,- i  +  ||[l  0  0  OjL^G’l^uWx,  j  >  1 

Po  =  0.  0i  =  0j-i  +  ||  [0  0  1  0]Lj_1G||1«jrnax,  j  >  1 

7o  =0,  jj  =  7j_j  +  ||[1  0]Kj_1Lp_1G\\1wmax,  j  >  1  (39) 

where  aj,  f3j,  and  7 j  respectively  represents  the  constraint  contraction  for  position,  velocity,  and  input  for  the  jlh 

prediction  step. 

B.  Output  Constraint  Set  (9) 

For  each  point  [x,  y]T  and  each  rectangular  shaped  obstacle  defined  by  two  comers  [xiow,  t/iow]T  and  [.rh,gh,  !/high]7', 
the  obstacle  avoidance  constraints  can  be  written  using  binary  variables 


Vo,  Vj  :  xvk+j]k  <  x,ow.0  -  ctj  +  M  tPoba  jol 

(40a) 

yVk+j\k  -  Wow.o  -  aj  +  M  ^0bst,jo2 

(40b) 

Xk+j\k  -  ^high.o  +  aj  ~  A/  ^bst ,j03 

(40c) 

vl+j\k  -  ^i>'8h  o  +  ai  A'^^bst.Jo4 

A 

(40d) 

4 

^obst ,joi  ^  3 

(40e) 

!=1 


where  M  is  a  large  number  to  relax  the  constraints  in  (40a)-(40d),  and  o  denotes  the  index  of  the  obstacle.  The 
logical  constraint  (40e)  requires  at  least  one  constraint  in  (40a)-(40d)  be  active.  Note  that  the  parameter  a,  tightens 
the  constraints. 

The  output  constraint  (9)  also  includes  the  bound  on  speed  and  inputs.  Let  vectors  r.v,  and  a  respectively 
represent  position,  velocity,  and  acceleration  input  in  the  inertia  frame.  A  set  of  n j  linear  constraints  approximates 
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the  two-norm  bounded  constraints  on  the  acceleration  and  velocity  vectors,  which  in  turn  limits  the  maximum 
turning  rate 

[cos  0m ,  sin  0m]  vpk+i\k  <  umax  -  V2 Pj  (41a) 

[cos(9m,  sin6»m]  ap+j^  <  amax  -  V 2 (41b) 
27T771 

0  m  1  v  TJX  1 ,  .  .  .  ,  Tid 

rid 

The  scaling  \/2  is  multiplied  to  account  for  the  effect  of  the  infinity-norm  bounded  disturbance  on  the  two-norm 
bounded  constraints.  The  minimum  speed  constraint  is  non-convex  and  requires  nv  binary  variables  to  express  in 
MILP 


[cos  9m,  sin  0mi\‘Vk+.,k  ^  t/min  *  V^Ipj  2'Umaxfi[[e|^ 


jm 


E  ^1.™  ^ n*  - 1 


(42a) 

(42b) 


For  vehicles  q(>  p),  the  subscript  j  in  the  the  constraint  tightening  parameters  aj .  Pj ,  jj  must  be  replaced  with 
j  +  1,  as  shown  in  (28)-(29). 


C.  Invariance  Constraints  (12) 

For  the  fixed-wing  aircraft,  the  vehicle  has  an  option  to  enter  left  or  right  loiter  circle  from  the  terminal  states. 
The  center  of  the  left  and  right  safety  circles  are 


Op  = 


opR 


k+N  |fc 


+  *(i) 
(-1) 


P 


PN  k+N^k 


+  R 


k+N 

where  R(d)  is  a  rotation  matrix  of  angle  6,  and  p  is  the  radius  of  the  turning  circle  given  by 


(43a) 

(43b) 


(%ax  P  n')'^  Hu 


Pn 


In 


The  second  term  accounts  for  the  variability  of  the  terminal  speed  ||i/' 
either  the  left  or  right  safety  circle 


+  Pn 

p  I 
fc+JV|fcl 


The  binary  variable  chooses 


Op-2(P  +  aN-1)(l-bpn) 

<  Op  <  Op  +  2(p  +  —  Cft) 

(44a) 

^r  —  2(p  +  ajv-i)hfc  f, 

<  Op  <OpR+2(p  +  aN-1)Kf< 

(44b) 

where  the  cost-to-go  (18)  is  evaluated  from  Op.  With  the  notation  Op  =  [xcenter,  t/center]T> 
constraints  of  the  safety  circle  are  written  as 

the  obstacle  avoidance 

Vo  :  xpenter  <  xlow,0  ~(p  +  ajv-t)  +  M  h[’lrc.obs,o| 

(45a) 

1/center  —  ?/low,0  —  (p  +  <+N  —  1 )  V  3/  |J'[[rc.0bst_o.j 

(45b) 

■^center  >  ®high.0  +  (p  +  «Af-l)  “  A/  C-obst.o3 

(45c) 

1/center  —  1/high. 0  +  (p  +  aN-l)  —  A/ hfirc_obs|  q4 

A 

(45d) 

4 

E^rc-obst,,,  <  3 
?  =  1 

(45e) 
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D.  Interconnected  Constraints  (11) 

Over  the  planning  horizon,  the  coupling  constraints  include  vehicle  avoidance  constraints 


Xk+j\k  —  Xk+j\k  ~  Cal  +  MKeh,jl 

(46a) 

yPk+j\k  —  Vk+j\k  ~  Cal  +  ^  Keh ,j2 

(46b) 

Xk+j\k  -  Xk+j\k  +  Cal  “  ^  Keh i,j3 

(46c) 

y?k+j\k  -  yl+j\k  +  Cal  _  ^ 

(46d) 

EC,,,<3 

i- 1 

(46e) 

where 

dpq  \  2d  +  2a j ,  q<p 

\^2d  +  ctj  +  ctj+i ,  q  >  p 

where  d  is  the  size  of  the  vehicle  region  that  other  vehicle  must  not  enter.  Beyond  the  planning  horizon,  constraints 

on  the  safety  circles  ensure  the  vehicle  avoidance 

Center  <  ^center  -  2(p  +  d  +  OLN- 1 )  +  M  (£?c ^ 

(47a) 

1/center  <  2/center  -  2(p  +  d  +  QjV-l)  +  M  />™2 

(47b) 

Center  >  ^center  +  2(p  +  d  +  aN  _j )  -  M 

(47c) 

2/center  >  2/center  +  2(p  +  d  +  aN-i)  -  M  4 

(47d) 

4 

i=l 

(47e) 

E.  Objective  Function 

The  objective  function  (18)  uses  a  binary  variable  6VjS  to  select  one  visible  point 

rVjS  from  a  list  of  cost  points. 

from  which  the  cost-to-go  is  known.  Let  rcPi;  denote  the  ilh  cost  point  and  i  =  1, . 
of  cost  points  stored  in  the  cost  map.  Then, 

. . ,  nc p  where  ncp  is  a  number 

7lCp 

rp-  =  N  b^-  rp 

vis  vvis >i  cp,  j 

1=1 

(48a) 

ECr' 

i=l 

^cp 

(48b) 

i=l 

(48c) 

Jv  >  [cos dm-,  s\n9m](Op  -  rps)  +  /p(rpis),  Vm 

(48d) 

where  the  cost  /p(r£p,i)  from  each  cost  point  to  the  target  of  vehicle  p  is  calculated  prior  to  MILP  and  is  constant 
in  MILP. 

To  ensure  the  selected  cost  point  rpis  is  visible  from  a  point  Op  in  the  invariant  set  Qpk ,  obstacle  avoidance 
constraints  are  enforced  on  njnt  interpolation  points  that  are  placed  on  the  line  connecting  rpjs  and  Op 


^low.0  &N—  1 

(49a) 

yiow.0  —  1 

(49b) 

•T'high.Q  &N—  1 

(49c) 

2/high.  0  + 

(49d) 
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XX„<  3  («<=) 

i— 1 

IH  ~~  >  l  —  1,  .  .  . 

^int 


F.  Decision  Variables 

The  inputs  are  constrained  to  be 

V?:  KcV-  =  K+T^.  (50) 

The  binary  variables  are  fixed  if  the  superscript  does  not  include  p,  except  for  bfeft  which  selects  the  left/right  safety 
circle 


bq 

yobst 

-  U<1 
wobst 

(51a) 

bq 

circ-obst 

_  J.Q 

circ-obst 

(51b) 

*4 

=  4 

(51c) 

=  bl 

(5  Id) 

=  ^veh  Vr  ¥=  P, 

(5 1  e) 

Cc-circ 

=  ^circ-circ '  Vr  +  P.  r^Q 

(5  If) 

In  summary,  the  MILP  implementation  of  subproblem  Ppk  is  to  minimize  (6)  subject  to  (7)-(8),  (40)— (5 1 ). 
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